Instrumental Conditioning Theory Part 1
Overview & Lecture Road-Map
Focus: theoretical foundations of instrumental (operant) conditioning.
Main arc of the lecture:
Introduce and formalize the Matching Law (choice behaviour).
Examine what variables (frequency, amount, delay) govern matching.
Combine variables to explore self-control.
Identify circumstances where matching predicts well and where it fails.
Transition to the associative structure underlying instrumental learning (S-R, R-O, S-O).
Preview: next lecture will tie these associations to neurobiology and addiction treatment.
Matching Law – Core Ideas
Everyday behaviour involves many competing options (phone, food, TV, etc.).
Matching Law: proportion of behaviour allocated to each option matches the proportion of reinforcement obtained from that option.
Rarely does psychology label a rule a “law”; matching earns that term because of its consistency across species and situations.
Behaviour ≈ Reinforcement
(B = behaviour/responses, R = reinforcement).Can be generalised to alternatives by extending the denominator with additional terms.
Demonstrations with Two Levers / Two Options
1. Frequency Manipulation
Setup: VI-1 min (≈60 pellets h⁻¹) vs. VI-2 min (≈30 pellets h⁻¹).
Prediction: → animals should press Lever A ≈66 % of the time.
Empirically confirmed in rats, pigeons, fish, primates, humans.
Excel simulation (tab “Frequency”) lets students adjust values (e.g., change 60→40 or 30→20) and observe shifting preferences.
2. Amount Manipulation
Setup: Lever A → 1 pellet; Lever B → 3 pellets.
Prediction: → 25 % of responses to Lever A, 75 % to Lever B.
Animals distribute accordingly; they do not abandon the small-paying lever entirely.
3. Delay Manipulation
Setup: Lever A → 2 s delay; Lever B → 3 s delay (both FR-1).
Prediction: → 60 % to faster lever (A).
Adjusting delays in the Excel “Delay” tab shows smooth, lawful shifts (e.g., make B = 4 s, preference shifts slightly toward B).
Self-Control: Combining Amount & Delay
Classic dilemma: immediate smaller reward vs. larger delayed reward (marshmallow test analogy; college education example).
Example parameters used in Excel “Self-Control” tab:
Lever A: 4 pellets after 4 s.
Lever B: 2 pellets after 1 s.
Prediction:
→ only ~33 % to larger-delayed option.
Animals indeed prefer the immediate-smaller reward (~67 %).
Commitment & Increasing Self-Control (Rachlin & Green 1972)
Pigeons given an early commitment choice:
Peck left white key → 10 s blackout → auto-green key (4 g, 4 s delay).
Peck right white key → 10 s blackout → presented with red (2 g, 1 s) vs. green (4 g, 4 s) choice.
Without extra delay, pigeons choose red (immediate small).
Adding 10 s extra delay to both outcomes (14 s vs. 11 s) flips preference to the larger reward — matching law predicts this shift.
Insight: “go to the library” strategy = add time/effort upfront → fosters self-control.
Where Matching Law Falls Short
Situations showing more self-control than predicted:
Introducing salient stimuli (flashing light, tone, odour) or requiring a small ongoing behaviour during the delay increases preference for the larger-later reward.
Developmental difference: children ≈ matching prediction; adult humans exceed prediction (greater self-control).
Model still powerful but not complete; highlights roles for attention, executive control, qualitative reinforcer value.
Real-World Relevance
Game designers (e.g., FarmVille, Bejeweled) engineer schedules and micro-transactions using matching principles.
Small immediate boost vs. grinding for larger future payoff.
Social competition layers additional pressure on choices.
Understanding matching aids in developing interventions for addiction: align therapeutic reinforcers with behaviour.
Associative Structure Behind Instrumental Choice
Three candidate associations:
Stimulus–Response (S-R)
Response–Outcome (R-O)
Stimulus–Outcome (S-O) – transitive link enabling flexible choice.
Evidence for S-R Associations – Water-Maze Cue Task
Pool filled to cover a hidden escape platform.
Two dangling spheres (grey = S+, white = S−) above water surface; sphere is not the response itself → tactile and visual cues separated.
Platform & cues moved daily; rats quickly learn to swim to location indicated by S+.
Shows a cue can directly control a specific response pattern (swim direction) independent of the outcome.
Evidence for R-O Associations – Devaluation (Colwill & Rescorla 1985)
Training phase:
R1 (lever press) → Outcome 1 (sucrose solution).
R2 (chain pull) → Outcome 2 (food pellet).
Devaluation phase:
Free access to Outcome 1 paired with sickness (illness); Outcome 2 paired with saline.
Extinction test (both responses available, no outcomes):
Sharp drop in R1 (devalued outcome) vs. robust R2.
Conclusion: animals encode which response produces which outcome and adjust after value change.
Evidence for S-O Associations – Complex Recombination (Colwill & Rescorla 1988)
Discrimination Training:
Stim 1 (noise) → R1 (nose poke) → Outcome 1 (food pellet).
Stim 2 (light) → R2 (handle pull) → Outcome 2 (sucrose).
Additional Response Training (no stimuli):
R3 → Outcome 1; R4 → Outcome 2.
Test: present Stim 1 or Stim 2 while giving choice between R3 and R4; no outcomes delivered.
Results: Higher response rate when stimulus and response shared the same outcome (Stim 1 evokes R3; Stim 2 evokes R4).
Provides compelling behavioural evidence for S-O links enabling flexible action selection.