Reinforcement Schedules III – Complex Schedules, Yoking & Differential Reinforcement
Lab Participation & Online Assessment
Worth 5\% total: split into two parts of 2.5\% each (this week & next).
Located in the Assessment section (under the lab-signup link).
1 Short-answer scenario (unlimited attempts).
1 Quiz (MCQ + ordering items; unlimited attempts).
Both remain open until Sunday 10 August → everyone has ≥ 2 weeks regardless of lab day.
Recap – Response Strength & Simple Schedules
Response strength = theoretical construct to quantify behaviour.
Historically indexed by response rate (slope in cumulative record).
Four simple schedules reviewed last lecture:
FI, FR, VI, VR – each shows a prototypical response pattern & reinforcement rate.
Empirical issue: need a fair way to compare ratio vs interval schedules while holding reinforcement rate constant.
Yoking: Equalising Reinforcement Rates
Yoking design: pair two subjects (or conditions) → “leader” & “follower”.
Leader’s earned reinforcer creates (yokes) the criterion for the follower.
Ensures identical reinforcement rate across schedules.
Classic Example – Catania, Matthews & Yoheln (1977)
Group 1
Leader: VR-25 (variable ratio, avg 25 responses/reinforcer).
Follower: becomes VI (interval length = leader’s response time).
Result: cumulative record shows steeper slope (higher response rate) for VR leader; notches align vertically (simultaneous food delivery).
Group 2 (roles reversed)
Leader: VI-30 s.
Follower: created VR (ratio length = # responses leader emitted in each interval).
Result: VR follower still shows higher response rate; notches align horizontally (reinforcer after same # responses).
Conclusion: Even with identical reinforcement rates, VR > VI in response rate. ⇒ Response rate is schedule-specific, undermining it as a pure measure of “strength”.
Later replications with humans (Matthews et al., 1977) show same pattern.
Theoretical Fallout
Researchers moved away from “response strength” as a single metric.
Alternative approaches:
Behavioural momentum theory (Nevin & Grace 2000).
Modelling response distributions (molecular vs molar analyses).
Practical takeaway: response rate itself remains valuable; schedules shape it systematically.
Differential Reinforcement Schedules (“Pacing” Schedules)
Common Definition
Differential reinforcement = reinforce some topographies/rates, withhold for others.
1 DRL – Differential Reinforcement of Low Rate
Criterion: a response only reinforced if inter-response time (IRT) > fixed value.
Example: DRL-15 s
Respond → start timer.
Next response after \ge 15\text{ s} ⇒ food.
Early response ⇒ timer resets (no food).
Uses: reduce but not eliminate behaviour (e.g., slow down speaking rate).
2 DRH – Differential Reinforcement of High Rate
Reinforce short IRTs or minimum # responses within t.
Example criteria:
DRH 0.4 s ⇒ food if second response within 0.4\text{ s} of previous.
“Emit \ge 5 responses in 2\text{ s}”.
Uses: build rapid or fluent responding (e.g., typing speed).
3 DRO – Differential Reinforcement of Other Behaviour (Omissions)
Reinforcer delivered only if target response has not occurred for set time.
Example: DRO-30 s
If no screaming for 30 s in supermarket ⇒ promise ice-cream.
Target response resets timer.
Think of it as extinction + scheduled rewards for not doing behaviour.
Often implemented intermittently (not every interval, not every shopping trip).
Quick Comparison
DRL & DRH: target response must still occur (slow vs fast).
DRO: target response must be completely absent (omission training).
Collectively called pacing schedules: shape speed/spacing of behaviour.
Combining Schedules
1 Tandem Schedules
Two+ schedules in succession without external signals.
E.g. Tandem VI-60 → DRH 0.3 s.
After avg 60 s passes, reinforcement only delivered if a burst (IRT ≤ 0.3 s).
2 Chain Schedules (Signalled)
Successive schedules with unique discriminative stimuli (SDs) per link.
Completion of Link 1 (initial link) produces SD for Link 2, etc.
Final link (terminal link) leads to primary reinforcer.
Entry into next link functions as conditioned (secondary) reinforcer.
Practical value: teaching behavioural sequences (e.g., shoelace tying).
Example: dog-agility course, each obstacle cues the next.
3 Multiple (Mult) Schedules
Two+ signalled schedules alternate in a session; transition controlled by experimenter (not subject).
Each component ends after time or # reinforcers; separated by inter-component interval (ICI).
Mixed schedule = same as multiple but no SDs (unsignalled).
Concurrent Schedules (Choice)
Two+ independent schedules available simultaneously on separate manipulanda.
Subject freely allocates behaviour.
Example: Conc VI-10 s (left lever) vs VI-20 s (right).
Optimal strategy: concentrate more on richer VI-10 but periodically sample VI-20.
Real-life analogue: “work on assignment” vs “doom-scroll phone”.
Foundation for research on matching law, choice, impulsivity (covered next week).
Concurrent-Chain Schedules
Adds a choice phase (initial link) followed by exclusive outcome phase (terminal link).
Choose Left or Right key during initial link → locked in.
Terminal links may differ in schedule or reinforcer magnitude.
Lets researchers test pre-commitment, delay discounting, sub-optimal choice, etc.
E.g. choose between: smaller-sooner \bigl(FR\,5;\,1\text{ pellet}\bigr) vs larger-later \bigl(FR\,100;\,4\text{ pellets}\bigr).
Secondary (Conditioned) Reinforcement in Chains
Primary reinforcers: phylogenetically important (food, water, sex, warmth).
Secondary (conditioned) reinforcers: acquire value via pairing with primary (money, points, entry to next link).
In chains, the SD for Link N+1 can double as a conditioned reinforcer for completing Link N.
Examples & Metaphors Used in Lecture
Casino/Pokies VR example:
You play slots (VR schedule), text friend after each win.
Friend merely checks phone occasionally (VI schedule).
Demonstrates VR (player) vs VI (observer) under yoking logic.
Supermarket shouting → DRO contingency: “No shouting for 30 s = ice-cream”.
Collect supermarket stickers = DRH (collect X stickers before promo ends).
Dog-agility course & corgi video = chain schedule, each obstacle signals next.
Rats in lab: forthcoming concurrent-chain experiment on delay discounting (small-soon vs large-late food).
Numerical / Timing References
Yoking study schedules: \text{VR}{25}, \text{VI}{30\,\text{s}}.
DRH example: \text{IRT}\le 0.4\,\text{s}.
DRL example: \text{IRT}\ge 15\,\text{s}.
DRO example: no target response for 30\,\text{s}.
Tandem example: \text{VI}{60\,\text{s}} \rightarrow \text{DRH}{0.3\,\text{s}}.
Key Take-Home Points / Study Checklist
Understand yoking and why VR still > VI in response rate despite equal reinforcement.
Be able to state the response criteria for:
\text{DRL}, \text{DRH}, \text{DRO}.
Distinguish & diagram:
Tandem vs Chain (unsignalled vs signalled succession).
Multiple vs Mixed (signalled vs unsignalled alternation).
Concurrent vs Concurrent-Chain (simultaneous vs choice → outcome).
Know terms: initial link, terminal link, conditioned reinforcer.
Recognise practical applications (behaviour reduction, skill fluency, training sequences).
Good luck with online lab tasks (due 10\,\text{Aug}) and upcoming class test! Direct any questions to the Q-and-A forum or lecturer email.