Instrumental Conditioning Theory Part 1

Everyday behaviour involves many competing options (phone, food, TV, etc.).
Matching Law: proportion of behaviour allocated to each option matches the proportion of reinforcement obtained from that option.
Rarely does psychology label a rule a “law”; matching earns that term because of its consistency across species and situations.
Behaviour ≈ Reinforcement
(B = behaviour/responses, R = reinforcement).
Can be generalised to $n$ alternatives by extending the denominator with additional terms.

Setup: VI-1 min (≈60 pellets h⁻¹) vs. VI-2 min (≈30 pellets h⁻¹).
Prediction: $\frac{60}{60+30}=0.67$ → animals should press Lever A ≈66 % of the time.
Empirically confirmed in rats, pigeons, fish, primates, humans.
Excel simulation (tab “Frequency”) lets students adjust values (e.g., change 60→40 or 30→20) and observe shifting preferences.

Setup: Lever A → 1 pellet; Lever B → 3 pellets.
Prediction: $\frac{1}{1+3}=0.25$ → 25 % of responses to Lever A, 75 % to Lever B.
Animals distribute accordingly; they do not abandon the small-paying lever entirely.

Setup: Lever A → 2 s delay; Lever B → 3 s delay (both FR-1).
Prediction: $\frac{1/2}{1/2+1/3}=0.60$ → 60 % to faster lever (A).
Adjusting delays in the Excel “Delay” tab shows smooth, lawful shifts (e.g., make B = 4 s, preference shifts slightly toward B).

Classic dilemma: immediate smaller reward vs. larger delayed reward (marshmallow test analogy; college education example).
Example parameters used in Excel “Self-Control” tab:
- Lever A: 4 pellets after 4 s.
- Lever B: 2 pellets after 1 s.
Prediction:
$\frac{4/4}{4/4 + 2/1}=\frac{1}{1+2}=0.33$ → only ~33 % to larger-delayed option.
Animals indeed prefer the immediate-smaller reward (~67 %).

Pigeons given an early commitment choice:
- Peck left white key → 10 s blackout → auto-green key (4 g, 4 s delay).
- Peck right white key → 10 s blackout → presented with red (2 g, 1 s) vs. green (4 g, 4 s) choice.
Without extra delay, pigeons choose red (immediate small).
Adding 10 s extra delay to both outcomes (14 s vs. 11 s) flips preference to the larger reward — matching law predicts this shift.
Insight: “go to the library” strategy = add time/effort upfront → fosters self-control.

Situations showing more self-control than predicted:
- Introducing salient stimuli (flashing light, tone, odour) or requiring a small ongoing behaviour during the delay increases preference for the larger-later reward.
- Developmental difference: children ≈ matching prediction; adult humans exceed prediction (greater self-control).
Model still powerful but not complete; highlights roles for attention, executive control, qualitative reinforcer value.

Game designers (e.g., FarmVille, Bejeweled) engineer schedules and micro-transactions using matching principles.
- Small immediate boost vs. grinding for larger future payoff.
- Social competition layers additional pressure on choices.
Understanding matching aids in developing interventions for addiction: align therapeutic reinforcers with behaviour.

Three candidate associations:

Pool filled to cover a hidden escape platform.
Two dangling spheres (grey = S+, white = S−) above water surface; sphere is not the response itself → tactile and visual cues separated.
Platform & cues moved daily; rats quickly learn to swim to location indicated by S+.
Shows a cue can directly control a specific response pattern (swim direction) independent of the outcome.

Training phase:
- R1 (lever press) → Outcome 1 (sucrose solution).
- R2 (chain pull) → Outcome 2 (food pellet).
Devaluation phase:
- Free access to Outcome 1 paired with $LiCl$ sickness (illness); Outcome 2 paired with saline.
Extinction test (both responses available, no outcomes):
- Sharp drop in R1 (devalued outcome) vs. robust R2.
Conclusion: animals encode which response produces which outcome and adjust after value change.

Discrimination Training:
- Stim 1 (noise) → R1 (nose poke) → Outcome 1 (food pellet).
- Stim 2 (light) → R2 (handle pull) → Outcome 2 (sucrose).
Additional Response Training (no stimuli):
- R3 → Outcome 1; R4 → Outcome 2.
Test: present Stim 1 or Stim 2 while giving choice between R3 and R4; no outcomes delivered.
Results: Higher response rate when stimulus and response shared the same outcome (Stim 1 evokes R3; Stim 2 evokes R4).
Provides compelling behavioural evidence for S-O links enabling flexible action selection.