Instrumental Conditioning: Schedules, Magnitude, Sequence & Response Characteristics

Cumulative Records & Early Methodology

  • Early researchers recorded behavior with a continuous paper roll; each response moved a stylus upward, producing a cumulative record.
    • Flat horizontal line → no responding.
    • Diagonal ascent → continuous responding.
    • Quick downward tick → moment a reinforcer (e.g.
      food) was delivered.
  • Allowed simultaneous visualization of response rate and timing of each reinforcement, laying the groundwork for schedule-of-reinforcement studies.

Major Classes of Schedules of Reinforcement

  • Two independent dimensions create four classic schedules:
    • Fixed (predictable) vs. Variable (unpredictable).
    • Ratio (response-based) vs. Interval (time-based).
  • Notation:
    • FR-nFR\text{-}n = fixed ratio, every nn responses → 1 reinforcer.
    • VR-nVR\text{-}n = variable ratio, responses fluctuate around an average of nn per reinforcer.
    • FI-tFI\text{-}t = fixed interval, first response after exactly tt seconds is reinforced.
    • VI-tVI\text{-}t = variable interval, first response after an average of tt seconds (interval lengths vary).

Fixed Ratio (FR)

  • Definition: Constant number of responses required for each reinforcement (e.g.
    FR-4FR\text{-}4, FR-10FR\text{-}10).
  • Cumulative-record pattern: “run–break–run.”
    • Rapid, steady responding (diagonal line).
    • Post-reinforcement pause (PRP): flat segment after each reinforcer.
  • Explanations for PRP
    1. Incompatibility hypothesis – eating disrupts lever pressing.
      • Rejected: other schedules still involve eating but show no PRP.
    2. Fatigue hypothesis – effort of high response requirement necessitates rest.
      • Supported partially by Felton & Lyon (1966): pigeons on FR=25,50,75,100,150FR=25,50,75,100,150 showed PRP length increasing with ratio size.
      • Problem: PRP still appears at very low ratios (e.g.
      FR=5FR=5) where fatigue seems unlikely.
    3. Timing hypothesis – animals use elapsed time, not response count, to predict next reinforcement.
      • Killeen (1969) yoked procedure: one rat on FR-10FR\text{-}10, partner received reinforcement at identical times but after only 1 response; yoked rat displayed a PRP, supporting temporal control.

Variable Ratio (VR)

  • Definition: Number of responses required changes unpredictably around a mean.
  • Behavioral output
    • Highest, most constant response rate of all classic schedules.
    • Virtually no PRP; each reinforcer gives little information about when the next will occur.
  • Real-world analogue: gambling (slot machines, poker hands).
    • Uncertain "win" schedule maintains intense engagement, informing neurological studies of addiction and reward prediction.

Fixed Interval (FI)

  • Definition: First response after a fixed time interval is reinforced.
  • Characteristic “FI-scallop” pattern on cumulative record:
    • Immediately after reinforcement → low rate.
    • Gradual acceleration → peak rate just before interval ends.
    • Small PRP then repeats.
  • Temporal cognition: despite only needing one press, animals distribute responses according to an internal clock operating in the seconds-to-minutes domain.

Variable Interval (VI)

  • Definition: First response after a variable, unpredictable interval (averaging tt seconds) is reinforced.
  • Produces moderate, steady response rate.
    • Overall slope proportional to average reinforcement density: shorter mean interval → steeper line.
  • Because timing is uncertain, responding remains evenly distributed.

Comparative Summary of Schedule Effects

  • Response-rate hierarchy (highest → lowest): VR > FR > VI > FI (at many parameter settings).
  • PRP prominent only in fixed schedules, especially FRFR; absent under VRVR; small under FIFI.
  • Temporal processes dominate interval schedules; response counting/effort dominate ratio schedules.

Magnitude of Reinforcement & Behavioral Contrast (Crespi 1942)

  • Method: Straight-runway; rats ran ≈2 m from start box to goal box.
    • Phase 1 groups: 64, 16, or 4 pellets in goal box.
    • Dependent variable: running speed (m/s).
  • Phase 1 results: 64 > 16 > 4 pellets produced correspondingly faster speeds (motivational effect).
  • Phase 2: All groups switched to 16 pellets.
    • 16→16 (Control) – speed unchanged.
    • 64→16 – marked deceleration (negative contrast).
    • 4→16 – marked acceleration (positive contrast).
  • Significance: Memory of previous reward magnitude modifies current motivation; value is relative, not absolute.
    • Parallels human affect to raises vs. pay cuts.

Sequence & Rule Learning: Hulse & Dorsky 1977 ("Holzendorski")

  • Both groups experienced identical magnitudes & totals (14, 7, 3, 1, 0 pellets) but in different orders.
    • Monotonic: 14→7→3→1→0 (single "greater-than" rule).
    • Non-monotonic: 14→1→3→7→0 (alternating "greater/less" rules).
  • Test: Latency to run on final zero-pellet trial across days.
    • Monotonic group learned faster to withhold effort (longer latencies), indicating superior anticipation of 0 reward.
    • Non-monotonic group improved slowly.
  • Implications: Animals encode sequential structure & rules, supporting cognitive processing beyond simple stimulus–response memory.

Species-Specific Response Characteristics

Brelands’ “Misbehavior of Organisms” (1961)

  • Professional animal trainers described conditioned behaviors drifting into innate food-related patterns ("instinctive drift").
    • Raccoon example: trained to deposit tokens → reverted to token-washing (species-typical food washing).
    • Pig example: trained to place coins → shifted to rooting behaviors.
  • Demonstration video: raccoon given cotton candy washes it; candy dissolves, yet washing behavior persists – illustrates dominance of feeding system elicited by food reinforcement.
  • Concept: Using food recruits hard-wired feeding motor programs that can displace operant response.

Belongingness in Instrumental Conditioning (Shuttleworth 1975)

  • Tested hamsters: Could food reinforce various spontaneous behaviors?
    • Behaviors sampled: face washing, flank scratching, rearing, digging, scrabbling.
  • Findings
    • Digging & rearing increased readily with food reinforcement (likely part of natural food-acquisition repertoire).
    • Grooming & scratching showed little change (neural system may not link these actions with food procurement).
  • Parallel to Garcia & Koelling taste-aversion work: learning is easier when response→reinforcer pairing fits species-specific ecological/neurological constraints.

Broader Theoretical & Practical Implications

  • Empiricism vs. Nativism: Operant behavior is shaped both by learned contingencies and innate response systems.
  • Temporal processing: FR pauses, FI scallops, and yoked designs reveal internal clocks governing behavior.
  • Behavioral economics: Magnitude contrast studies forecast how relative gains/losses affect motivation.
  • Applied contexts
    • Gambling machines leverage VRVR principles to maximize engagement.
    • Training & animal welfare: selecting reinforcers congruent with species-specific behaviors prevents "misbehavior."
    • Clinical psychology: schedule manipulation underpins behavior-modification programs (e.g.
      ratio thinning, interval schedules for maintenance).
  • Ethical note: Understanding innate tendencies helps minimize frustration and stress during animal training or research protocols.

Key Numerical Relationships & Terms (Quick Reference)

  • PRPPRP \uparrow as FRFR value nn \uparrow (Felton & Lyon).
  • VRVR average responses ≈ set value, but individual requirement varies trial-to-trial.
  • FIFI slope reflects acceleration toward terminal time tt; scallop period roughly mirrors interval length.
  • Behavioral contrast magnitude proportional to log-ratio of previous vs. current reinforcement (qualitative description; formal models extend with log(M<em>oldM</em>new)\log(\frac{M<em>{old}}{M</em>{new}}) type terms).

Study Questions for Review

  • Why does the PRP persist at low FR values even when fatigue is minimal?
  • How could you design a human experiment paralleling Crespi’s positive and negative contrasts using monetary rewards?
  • Predict the response pattern if a VIVI schedule were gradually thinned (average interval increased) while a constant FRFR schedule remained unchanged.
  • In training a new behavior, when might a trainer prefer a VIVI over a VRVR schedule despite its lower response rate?
  • Provide an example of misbehavior/instinctive drift you might encounter when training a household pet.