Instrumental Conditioning: Schedules, Magnitude, Sequence & Response Characteristics

Cumulative Records & Early Methodology

Early researchers recorded behavior with a continuous paper roll; each response moved a stylus upward, producing a cumulative record.
- Flat horizontal line → no responding.
- Diagonal ascent → continuous responding.
- Quick downward tick → moment a reinforcer (e.g.
  food) was delivered.
Allowed simultaneous visualization of response rate and timing of each reinforcement, laying the groundwork for schedule-of-reinforcement studies.

Major Classes of Schedules of Reinforcement

Two independent dimensions create four classic schedules:
- Fixed (predictable) vs. Variable (unpredictable).
- Ratio (response-based) vs. Interval (time-based).
Notation:
- $FR\text{-}n$ = fixed ratio, every $n$ responses → 1 reinforcer.
- $VR\text{-}n$ = variable ratio, responses fluctuate around an average of $n$ per reinforcer.
- $FI\text{-}t$ = fixed interval, first response after exactly $t$ seconds is reinforced.
- $VI\text{-}t$ = variable interval, first response after an average of $t$ seconds (interval lengths vary).

Fixed Ratio (FR)

Definition: Constant number of responses required for each reinforcement (e.g.
$FR\text{-}4$ , $FR\text{-}10$ ).
Cumulative-record pattern: “run–break–run.”
- Rapid, steady responding (diagonal line).
- Post-reinforcement pause (PRP): flat segment after each reinforcer.
Explanations for PRP
1. Incompatibility hypothesis – eating disrupts lever pressing.
  • Rejected: other schedules still involve eating but show no PRP.
2. Fatigue hypothesis – effort of high response requirement necessitates rest.
  • Supported partially by Felton & Lyon (1966): pigeons on $FR=25,50,75,100,150$ showed PRP length increasing with ratio size.
  • Problem: PRP still appears at very low ratios (e.g.
  $FR=5$ ) where fatigue seems unlikely.
3. Timing hypothesis – animals use elapsed time, not response count, to predict next reinforcement.
  • Killeen (1969) yoked procedure: one rat on $FR\text{-}10$ , partner received reinforcement at identical times but after only 1 response; yoked rat displayed a PRP, supporting temporal control.

Variable Ratio (VR)

Definition: Number of responses required changes unpredictably around a mean.
Behavioral output
- Highest, most constant response rate of all classic schedules.
- Virtually no PRP; each reinforcer gives little information about when the next will occur.
Real-world analogue: gambling (slot machines, poker hands).
- Uncertain "win" schedule maintains intense engagement, informing neurological studies of addiction and reward prediction.

Fixed Interval (FI)

Definition: First response after a fixed time interval is reinforced.
Characteristic “FI-scallop” pattern on cumulative record:
- Immediately after reinforcement → low rate.
- Gradual acceleration → peak rate just before interval ends.
- Small PRP then repeats.
Temporal cognition: despite only needing one press, animals distribute responses according to an internal clock operating in the seconds-to-minutes domain.

Variable Interval (VI)

Definition: First response after a variable, unpredictable interval (averaging $t$ seconds) is reinforced.
Produces moderate, steady response rate.
- Overall slope proportional to average reinforcement density: shorter mean interval → steeper line.
Because timing is uncertain, responding remains evenly distributed.

Comparative Summary of Schedule Effects

Response-rate hierarchy (highest → lowest): VR > FR > VI > FI (at many parameter settings).
PRP prominent only in fixed schedules, especially $FR$ ; absent under $VR$ ; small under $FI$ .
Temporal processes dominate interval schedules; response counting/effort dominate ratio schedules.

Magnitude of Reinforcement & Behavioral Contrast (Crespi 1942)

Method: Straight-runway; rats ran ≈2 m from start box to goal box.
- Phase 1 groups: 64, 16, or 4 pellets in goal box.
- Dependent variable: running speed (m/s).
Phase 1 results: 64 > 16 > 4 pellets produced correspondingly faster speeds (motivational effect).
Phase 2: All groups switched to 16 pellets.
- 16→16 (Control) – speed unchanged.
- 64→16 – marked deceleration (negative contrast).
- 4→16 – marked acceleration (positive contrast).
Significance: Memory of previous reward magnitude modifies current motivation; value is relative, not absolute.
- Parallels human affect to raises vs. pay cuts.

Sequence & Rule Learning: Hulse & Dorsky 1977 ("Holzendorski")

Both groups experienced identical magnitudes & totals (14, 7, 3, 1, 0 pellets) but in different orders.
- Monotonic: 14→7→3→1→0 (single "greater-than" rule).
- Non-monotonic: 14→1→3→7→0 (alternating "greater/less" rules).
Test: Latency to run on final zero-pellet trial across days.
- Monotonic group learned faster to withhold effort (longer latencies), indicating superior anticipation of 0 reward.
- Non-monotonic group improved slowly.
Implications: Animals encode sequential structure & rules, supporting cognitive processing beyond simple stimulus–response memory.

Species-Specific Response Characteristics

Brelands’ “Misbehavior of Organisms” (1961)

Professional animal trainers described conditioned behaviors drifting into innate food-related patterns ("instinctive drift").
- Raccoon example: trained to deposit tokens → reverted to token-washing (species-typical food washing).
- Pig example: trained to place coins → shifted to rooting behaviors.
Demonstration video: raccoon given cotton candy washes it; candy dissolves, yet washing behavior persists – illustrates dominance of feeding system elicited by food reinforcement.
Concept: Using food recruits hard-wired feeding motor programs that can displace operant response.

Belongingness in Instrumental Conditioning (Shuttleworth 1975)

Tested hamsters: Could food reinforce various spontaneous behaviors?
- Behaviors sampled: face washing, flank scratching, rearing, digging, scrabbling.
Findings
- Digging & rearing increased readily with food reinforcement (likely part of natural food-acquisition repertoire).
- Grooming & scratching showed little change (neural system may not link these actions with food procurement).
Parallel to Garcia & Koelling taste-aversion work: learning is easier when response→reinforcer pairing fits species-specific ecological/neurological constraints.

Broader Theoretical & Practical Implications

Empiricism vs. Nativism: Operant behavior is shaped both by learned contingencies and innate response systems.
Temporal processing: FR pauses, FI scallops, and yoked designs reveal internal clocks governing behavior.
Behavioral economics: Magnitude contrast studies forecast how relative gains/losses affect motivation.
Applied contexts
- Gambling machines leverage $VR$ principles to maximize engagement.
- Training & animal welfare: selecting reinforcers congruent with species-specific behaviors prevents "misbehavior."
- Clinical psychology: schedule manipulation underpins behavior-modification programs (e.g.
  ratio thinning, interval schedules for maintenance).
Ethical note: Understanding innate tendencies helps minimize frustration and stress during animal training or research protocols.

Key Numerical Relationships & Terms (Quick Reference)

$PRP \uparrow$ as $FR$ value $n \uparrow$ (Felton & Lyon).
$VR$ average responses ≈ set value, but individual requirement varies trial-to-trial.
$FI$ slope reflects acceleration toward terminal time $t$ ; scallop period roughly mirrors interval length.
Behavioral contrast magnitude proportional to log-ratio of previous vs. current reinforcement (qualitative description; formal models extend with $\log(\frac{M<em>{old}}{M</em>{new}})$ type terms).

Study Questions for Review

Why does the PRP persist at low FR values even when fatigue is minimal?
How could you design a human experiment paralleling Crespi’s positive and negative contrasts using monetary rewards?
Predict the response pattern if a $VI$ schedule were gradually thinned (average interval increased) while a constant $FR$ schedule remained unchanged.
In training a new behavior, when might a trainer prefer a $VI$ over a $VR$ schedule despite its lower response rate?
Provide an example of misbehavior/instinctive drift you might encounter when training a household pet.