Reinforcement Schedules III – Complex Schedules, Yoking & Differential Reinforcement

Worth $5\%$ total: split into two parts of $2.5\%$ each (this week & next).
Located in the Assessment section (under the lab-signup link).
- 1 Short-answer scenario (unlimited attempts).
- 1 Quiz (MCQ + ordering items; unlimited attempts).
Both remain open until Sunday 10 August → everyone has ≥ 2 weeks regardless of lab day.

Response strength = theoretical construct to quantify behaviour.
- Historically indexed by response rate (slope in cumulative record).
Four simple schedules reviewed last lecture:
- FI, FR, VI, VR – each shows a prototypical response pattern & reinforcement rate.
Empirical issue: need a fair way to compare ratio vs interval schedules while holding reinforcement rate constant.

Yoking design: pair two subjects (or conditions) → “leader” & “follower”.
- Leader’s earned reinforcer creates (yokes) the criterion for the follower.
- Ensures identical reinforcement rate across schedules.

Group 1
- Leader: VR-25 (variable ratio, avg 25 responses/reinforcer).
- Follower: becomes VI (interval length = leader’s response time).
- Result: cumulative record shows steeper slope (higher response rate) for VR leader; notches align vertically (simultaneous food delivery).
Group 2 (roles reversed)
- Leader: VI-30 s.
- Follower: created VR (ratio length = # responses leader emitted in each interval).
- Result: VR follower still shows higher response rate; notches align horizontally (reinforcer after same # responses).

Conclusion: Even with identical reinforcement rates, VR > VI in response rate. ⇒ Response rate is schedule-specific, undermining it as a pure measure of “strength”.
Later replications with humans (Matthews et al., 1977) show same pattern.

Researchers moved away from “response strength” as a single metric.
Alternative approaches:
- Behavioural momentum theory (Nevin & Grace 2000).
- Modelling response distributions (molecular vs molar analyses).
Practical takeaway: response rate itself remains valuable; schedules shape it systematically.

Differential reinforcement = reinforce some topographies/rates, withhold for others.

Criterion: a response only reinforced if inter-response time (IRT) > fixed value.
Example: DRL-15 s
- Respond → start timer.
- Next response after $\ge 15\text{ s}$ ⇒ food.
- Early response ⇒ timer resets (no food).
Uses: reduce but not eliminate behaviour (e.g., slow down speaking rate).

Reinforce short IRTs or minimum # responses within t.
Example criteria:
- DRH 0.4 s ⇒ food if second response within $0.4\text{ s}$ of previous.
- “Emit $\ge 5$ responses in $2\text{ s}$ ”.
Uses: build rapid or fluent responding (e.g., typing speed).

Reinforcer delivered only if target response has not occurred for set time.
Example: DRO-30 s
- If no screaming for 30 s in supermarket ⇒ promise ice-cream.
- Target response resets timer.
Think of it as extinction + scheduled rewards for not doing behaviour.
Often implemented intermittently (not every interval, not every shopping trip).

Two+ schedules in succession without external signals.
- E.g. Tandem VI-60 → DRH 0.3 s.
- After avg 60 s passes, reinforcement only delivered if a burst (IRT ≤ 0.3 s).

Successive schedules with unique discriminative stimuli (SDs) per link.
- Completion of Link 1 (initial link) produces SD for Link 2, etc.
- Final link (terminal link) leads to primary reinforcer.
Entry into next link functions as conditioned (secondary) reinforcer.
Practical value: teaching behavioural sequences (e.g., shoelace tying).
- Example: dog-agility course, each obstacle cues the next.

Two+ signalled schedules alternate in a session; transition controlled by experimenter (not subject).
- Each component ends after time or # reinforcers; separated by inter-component interval (ICI).
Mixed schedule = same as multiple but no SDs (unsignalled).

Two+ independent schedules available simultaneously on separate manipulanda.
- Subject freely allocates behaviour.
Example: Conc VI-10 s (left lever) vs VI-20 s (right).
- Optimal strategy: concentrate more on richer VI-10 but periodically sample VI-20.
Real-life analogue: “work on assignment” vs “doom-scroll phone”.
Foundation for research on matching law, choice, impulsivity (covered next week).

Adds a choice phase (initial link) followed by exclusive outcome phase (terminal link).
- Choose Left or Right key during initial link → locked in.
- Terminal links may differ in schedule or reinforcer magnitude.
Lets researchers test pre-commitment, delay discounting, sub-optimal choice, etc.
- E.g. choose between: smaller-sooner $\bigl(FR\,5;\,1\text{ pellet}\bigr)$ vs larger-later $\bigl(FR\,100;\,4\text{ pellets}\bigr)$ .

Primary reinforcers: phylogenetically important (food, water, sex, warmth).
Secondary (conditioned) reinforcers: acquire value via pairing with primary (money, points, entry to next link).
In chains, the SD for Link N+1 can double as a conditioned reinforcer for completing Link N.

Casino/Pokies VR example:
- You play slots (VR schedule), text friend after each win.
- Friend merely checks phone occasionally (VI schedule).
- Demonstrates VR (player) vs VI (observer) under yoking logic.
Supermarket shouting → DRO contingency: “No shouting for 30 s = ice-cream”.
Collect supermarket stickers = DRH (collect X stickers before promo ends).
Dog-agility course & corgi video = chain schedule, each obstacle signals next.
Rats in lab: forthcoming concurrent-chain experiment on delay discounting (small-soon vs large-late food).

Yoking study schedules: $\text{VR}<em>{25}$ , $\text{VI}</em>{30\,\text{s}}$ .
DRH example: $\text{IRT}\le 0.4\,\text{s}$ .
DRL example: $\text{IRT}\ge 15\,\text{s}$ .
DRO example: no target response for $30\,\text{s}$ .
Tandem example: $\text{VI}<em>{60\,\text{s}} \rightarrow \text{DRH}</em>{0.3\,\text{s}}$ .

Understand yoking and why VR still > VI in response rate despite equal reinforcement.
Be able to state the response criteria for:
- $\text{DRL}, \text{DRH}, \text{DRO}$ .
Distinguish & diagram:
- Tandem vs Chain (unsignalled vs signalled succession).
- Multiple vs Mixed (signalled vs unsignalled alternation).
- Concurrent vs Concurrent-Chain (simultaneous vs choice → outcome).
Know terms: initial link, terminal link, conditioned reinforcer.
Recognise practical applications (behaviour reduction, skill fluency, training sequences).

Good luck with online lab tasks (due $10\,\text{Aug}$ ) and upcoming class test! Direct any questions to the Q-and-A forum or lecturer email.