Instrumental Conditioning: Schedules, Magnitude, Sequence & Response Characteristics
Cumulative Records & Early Methodology
- Early researchers recorded behavior with a continuous paper roll; each response moved a stylus upward, producing a cumulative record.
- Flat horizontal line → no responding.
- Diagonal ascent → continuous responding.
- Quick downward tick → moment a reinforcer (e.g.
food) was delivered.
- Allowed simultaneous visualization of response rate and timing of each reinforcement, laying the groundwork for schedule-of-reinforcement studies.
Major Classes of Schedules of Reinforcement
- Two independent dimensions create four classic schedules:
- Fixed (predictable) vs. Variable (unpredictable).
- Ratio (response-based) vs. Interval (time-based).
- Notation:
- FR-n = fixed ratio, every n responses → 1 reinforcer.
- VR-n = variable ratio, responses fluctuate around an average of n per reinforcer.
- FI-t = fixed interval, first response after exactly t seconds is reinforced.
- VI-t = variable interval, first response after an average of t seconds (interval lengths vary).
Fixed Ratio (FR)
- Definition: Constant number of responses required for each reinforcement (e.g.
FR-4, FR-10). - Cumulative-record pattern: “run–break–run.”
- Rapid, steady responding (diagonal line).
- Post-reinforcement pause (PRP): flat segment after each reinforcer.
- Explanations for PRP
- Incompatibility hypothesis – eating disrupts lever pressing.
• Rejected: other schedules still involve eating but show no PRP. - Fatigue hypothesis – effort of high response requirement necessitates rest.
• Supported partially by Felton & Lyon (1966): pigeons on FR=25,50,75,100,150 showed PRP length increasing with ratio size.
• Problem: PRP still appears at very low ratios (e.g.
FR=5) where fatigue seems unlikely. - Timing hypothesis – animals use elapsed time, not response count, to predict next reinforcement.
• Killeen (1969) yoked procedure: one rat on FR-10, partner received reinforcement at identical times but after only 1 response; yoked rat displayed a PRP, supporting temporal control.
Variable Ratio (VR)
- Definition: Number of responses required changes unpredictably around a mean.
- Behavioral output
- Highest, most constant response rate of all classic schedules.
- Virtually no PRP; each reinforcer gives little information about when the next will occur.
- Real-world analogue: gambling (slot machines, poker hands).
- Uncertain "win" schedule maintains intense engagement, informing neurological studies of addiction and reward prediction.
Fixed Interval (FI)
- Definition: First response after a fixed time interval is reinforced.
- Characteristic “FI-scallop” pattern on cumulative record:
- Immediately after reinforcement → low rate.
- Gradual acceleration → peak rate just before interval ends.
- Small PRP then repeats.
- Temporal cognition: despite only needing one press, animals distribute responses according to an internal clock operating in the seconds-to-minutes domain.
Variable Interval (VI)
- Definition: First response after a variable, unpredictable interval (averaging t seconds) is reinforced.
- Produces moderate, steady response rate.
- Overall slope proportional to average reinforcement density: shorter mean interval → steeper line.
- Because timing is uncertain, responding remains evenly distributed.
Comparative Summary of Schedule Effects
- Response-rate hierarchy (highest → lowest): VR > FR > VI > FI (at many parameter settings).
- PRP prominent only in fixed schedules, especially FR; absent under VR; small under FI.
- Temporal processes dominate interval schedules; response counting/effort dominate ratio schedules.
Magnitude of Reinforcement & Behavioral Contrast (Crespi 1942)
- Method: Straight-runway; rats ran ≈2 m from start box to goal box.
- Phase 1 groups: 64, 16, or 4 pellets in goal box.
- Dependent variable: running speed (m/s).
- Phase 1 results: 64 > 16 > 4 pellets produced correspondingly faster speeds (motivational effect).
- Phase 2: All groups switched to 16 pellets.
- 16→16 (Control) – speed unchanged.
- 64→16 – marked deceleration (negative contrast).
- 4→16 – marked acceleration (positive contrast).
- Significance: Memory of previous reward magnitude modifies current motivation; value is relative, not absolute.
- Parallels human affect to raises vs. pay cuts.
Sequence & Rule Learning: Hulse & Dorsky 1977 ("Holzendorski")
- Both groups experienced identical magnitudes & totals (14, 7, 3, 1, 0 pellets) but in different orders.
- Monotonic: 14→7→3→1→0 (single "greater-than" rule).
- Non-monotonic: 14→1→3→7→0 (alternating "greater/less" rules).
- Test: Latency to run on final zero-pellet trial across days.
- Monotonic group learned faster to withhold effort (longer latencies), indicating superior anticipation of 0 reward.
- Non-monotonic group improved slowly.
- Implications: Animals encode sequential structure & rules, supporting cognitive processing beyond simple stimulus–response memory.
Species-Specific Response Characteristics
Brelands’ “Misbehavior of Organisms” (1961)
- Professional animal trainers described conditioned behaviors drifting into innate food-related patterns ("instinctive drift").
- Raccoon example: trained to deposit tokens → reverted to token-washing (species-typical food washing).
- Pig example: trained to place coins → shifted to rooting behaviors.
- Demonstration video: raccoon given cotton candy washes it; candy dissolves, yet washing behavior persists – illustrates dominance of feeding system elicited by food reinforcement.
- Concept: Using food recruits hard-wired feeding motor programs that can displace operant response.
Belongingness in Instrumental Conditioning (Shuttleworth 1975)
- Tested hamsters: Could food reinforce various spontaneous behaviors?
- Behaviors sampled: face washing, flank scratching, rearing, digging, scrabbling.
- Findings
- Digging & rearing increased readily with food reinforcement (likely part of natural food-acquisition repertoire).
- Grooming & scratching showed little change (neural system may not link these actions with food procurement).
- Parallel to Garcia & Koelling taste-aversion work: learning is easier when response→reinforcer pairing fits species-specific ecological/neurological constraints.
Broader Theoretical & Practical Implications
- Empiricism vs. Nativism: Operant behavior is shaped both by learned contingencies and innate response systems.
- Temporal processing: FR pauses, FI scallops, and yoked designs reveal internal clocks governing behavior.
- Behavioral economics: Magnitude contrast studies forecast how relative gains/losses affect motivation.
- Applied contexts
- Gambling machines leverage VR principles to maximize engagement.
- Training & animal welfare: selecting reinforcers congruent with species-specific behaviors prevents "misbehavior."
- Clinical psychology: schedule manipulation underpins behavior-modification programs (e.g.
ratio thinning, interval schedules for maintenance).
- Ethical note: Understanding innate tendencies helps minimize frustration and stress during animal training or research protocols.
Key Numerical Relationships & Terms (Quick Reference)
- PRP↑ as FR value n↑ (Felton & Lyon).
- VR average responses ≈ set value, but individual requirement varies trial-to-trial.
- FI slope reflects acceleration toward terminal time t; scallop period roughly mirrors interval length.
- Behavioral contrast magnitude proportional to log-ratio of previous vs. current reinforcement (qualitative description; formal models extend with log(M</em>newM<em>old) type terms).
Study Questions for Review
- Why does the PRP persist at low FR values even when fatigue is minimal?
- How could you design a human experiment paralleling Crespi’s positive and negative contrasts using monetary rewards?
- Predict the response pattern if a VI schedule were gradually thinned (average interval increased) while a constant FR schedule remained unchanged.
- In training a new behavior, when might a trainer prefer a VI over a VR schedule despite its lower response rate?
- Provide an example of misbehavior/instinctive drift you might encounter when training a household pet.