Instrumental Conditioning Theory Part 1

Overview & Lecture Road-Map

  • Focus: theoretical foundations of instrumental (operant) conditioning.

  • Main arc of the lecture:

    • Introduce and formalize the Matching Law (choice behaviour).

    • Examine what variables (frequency, amount, delay) govern matching.

    • Combine variables to explore self-control.

    • Identify circumstances where matching predicts well and where it fails.

    • Transition to the associative structure underlying instrumental learning (S-R, R-O, S-O).

    • Preview: next lecture will tie these associations to neurobiology and addiction treatment.

Matching Law – Core Ideas

  • Everyday behaviour involves many competing options (phone, food, TV, etc.).

  • Matching Law: proportion of behaviour allocated to each option matches the proportion of reinforcement obtained from that option.

  • Rarely does psychology label a rule a “law”; matching earns that term because of its consistency across species and situations.

  • Behaviour ≈ Reinforcement
    (B = behaviour/responses, R = reinforcement).

  • Can be generalised to nn alternatives by extending the denominator with additional terms.

Demonstrations with Two Levers / Two Options

1. Frequency Manipulation
  • Setup: VI-1 min (≈60 pellets h⁻¹) vs. VI-2 min (≈30 pellets h⁻¹).

  • Prediction: 6060+30=0.67\frac{60}{60+30}=0.67 → animals should press Lever A ≈66 % of the time.

  • Empirically confirmed in rats, pigeons, fish, primates, humans.

  • Excel simulation (tab “Frequency”) lets students adjust values (e.g., change 60→40 or 30→20) and observe shifting preferences.

2. Amount Manipulation
  • Setup: Lever A → 1 pellet; Lever B → 3 pellets.

  • Prediction: 11+3=0.25\frac{1}{1+3}=0.25 → 25 % of responses to Lever A, 75 % to Lever B.

  • Animals distribute accordingly; they do not abandon the small-paying lever entirely.

3. Delay Manipulation
  • Setup: Lever A → 2 s delay; Lever B → 3 s delay (both FR-1).

  • Prediction: 1/21/2+1/3=0.60\frac{1/2}{1/2+1/3}=0.60 → 60 % to faster lever (A).

  • Adjusting delays in the Excel “Delay” tab shows smooth, lawful shifts (e.g., make B = 4 s, preference shifts slightly toward B).

Self-Control: Combining Amount & Delay

  • Classic dilemma: immediate smaller reward vs. larger delayed reward (marshmallow test analogy; college education example).

  • Example parameters used in Excel “Self-Control” tab:

    • Lever A: 4 pellets after 4 s.

    • Lever B: 2 pellets after 1 s.

  • Prediction:
    4/44/4+2/1=11+2=0.33\frac{4/4}{4/4 + 2/1}=\frac{1}{1+2}=0.33 → only ~33 % to larger-delayed option.
    Animals indeed prefer the immediate-smaller reward (~67 %).

Commitment & Increasing Self-Control (Rachlin & Green 1972)
  • Pigeons given an early commitment choice:

    • Peck left white key → 10 s blackout → auto-green key (4 g, 4 s delay).

    • Peck right white key → 10 s blackout → presented with red (2 g, 1 s) vs. green (4 g, 4 s) choice.

  • Without extra delay, pigeons choose red (immediate small).

  • Adding 10 s extra delay to both outcomes (14 s vs. 11 s) flips preference to the larger reward — matching law predicts this shift.

  • Insight: “go to the library” strategy = add time/effort upfront → fosters self-control.

Where Matching Law Falls Short

  • Situations showing more self-control than predicted:

    • Introducing salient stimuli (flashing light, tone, odour) or requiring a small ongoing behaviour during the delay increases preference for the larger-later reward.

    • Developmental difference: children ≈ matching prediction; adult humans exceed prediction (greater self-control).

  • Model still powerful but not complete; highlights roles for attention, executive control, qualitative reinforcer value.

Real-World Relevance

  • Game designers (e.g., FarmVille, Bejeweled) engineer schedules and micro-transactions using matching principles.

    • Small immediate boost vs. grinding for larger future payoff.

    • Social competition layers additional pressure on choices.

  • Understanding matching aids in developing interventions for addiction: align therapeutic reinforcers with behaviour.

Associative Structure Behind Instrumental Choice

Three candidate associations:

  1. Stimulus–Response (S-R)

  2. Response–Outcome (R-O)

  3. Stimulus–Outcome (S-O) – transitive link enabling flexible choice.

Evidence for S-R Associations – Water-Maze Cue Task
  • Pool filled to cover a hidden escape platform.

  • Two dangling spheres (grey = S+, white = S−) above water surface; sphere is not the response itself → tactile and visual cues separated.

  • Platform & cues moved daily; rats quickly learn to swim to location indicated by S+.

  • Shows a cue can directly control a specific response pattern (swim direction) independent of the outcome.

Evidence for R-O Associations – Devaluation (Colwill & Rescorla 1985)
  • Training phase:

    • R1 (lever press) → Outcome 1 (sucrose solution).

    • R2 (chain pull) → Outcome 2 (food pellet).

  • Devaluation phase:

    • Free access to Outcome 1 paired with LiClLiCl sickness (illness); Outcome 2 paired with saline.

  • Extinction test (both responses available, no outcomes):

    • Sharp drop in R1 (devalued outcome) vs. robust R2.

  • Conclusion: animals encode which response produces which outcome and adjust after value change.

Evidence for S-O Associations – Complex Recombination (Colwill & Rescorla 1988)
  • Discrimination Training:

    • Stim 1 (noise) → R1 (nose poke) → Outcome 1 (food pellet).

    • Stim 2 (light) → R2 (handle pull) → Outcome 2 (sucrose).

  • Additional Response Training (no stimuli):

    • R3 → Outcome 1; R4 → Outcome 2.

  • Test: present Stim 1 or Stim 2 while giving choice between R3 and R4; no outcomes delivered.

  • Results: Higher response rate when stimulus and response shared the same outcome (Stim 1 evokes R3; Stim 2 evokes R4).

  • Provides compelling behavioural evidence for S-O links enabling flexible action selection.