Reinforcement Schedules III – Complex Schedules, Yoking & Differential Reinforcement

Lab Participation & Online Assessment

  • Worth 5\% total: split into two parts of 2.5\% each (this week & next).

  • Located in the Assessment section (under the lab-signup link).

    • 1 Short-answer scenario (unlimited attempts).

    • 1 Quiz (MCQ + ordering items; unlimited attempts).

  • Both remain open until Sunday 10 August → everyone has ≥ 2 weeks regardless of lab day.


Recap – Response Strength & Simple Schedules

  • Response strength = theoretical construct to quantify behaviour.

    • Historically indexed by response rate (slope in cumulative record).

  • Four simple schedules reviewed last lecture:

    • FI, FR, VI, VR – each shows a prototypical response pattern & reinforcement rate.

  • Empirical issue: need a fair way to compare ratio vs interval schedules while holding reinforcement rate constant.


Yoking: Equalising Reinforcement Rates

  • Yoking design: pair two subjects (or conditions) → “leader” & “follower”.

    • Leader’s earned reinforcer creates (yokes) the criterion for the follower.

    • Ensures identical reinforcement rate across schedules.

Classic Example – Catania, Matthews & Yoheln (1977)
  1. Group 1

    • Leader: VR-25 (variable ratio, avg 25 responses/reinforcer).

    • Follower: becomes VI (interval length = leader’s response time).

    • Result: cumulative record shows steeper slope (higher response rate) for VR leader; notches align vertically (simultaneous food delivery).

  2. Group 2 (roles reversed)

    • Leader: VI-30 s.

    • Follower: created VR (ratio length = # responses leader emitted in each interval).

    • Result: VR follower still shows higher response rate; notches align horizontally (reinforcer after same # responses).

  • Conclusion: Even with identical reinforcement rates, VR > VI in response rate. ⇒ Response rate is schedule-specific, undermining it as a pure measure of “strength”.

  • Later replications with humans (Matthews et al., 1977) show same pattern.

Theoretical Fallout
  • Researchers moved away from “response strength” as a single metric.

  • Alternative approaches:

    • Behavioural momentum theory (Nevin & Grace 2000).

    • Modelling response distributions (molecular vs molar analyses).

  • Practical takeaway: response rate itself remains valuable; schedules shape it systematically.


Differential Reinforcement Schedules (“Pacing” Schedules)

Common Definition
  • Differential reinforcement = reinforce some topographies/rates, withhold for others.

1 DRL – Differential Reinforcement of Low Rate
  • Criterion: a response only reinforced if inter-response time (IRT) > fixed value.

  • Example: DRL-15 s

    • Respond → start timer.

    • Next response after \ge 15\text{ s} ⇒ food.

    • Early response ⇒ timer resets (no food).

  • Uses: reduce but not eliminate behaviour (e.g., slow down speaking rate).

2 DRH – Differential Reinforcement of High Rate
  • Reinforce short IRTs or minimum # responses within t.

  • Example criteria:

    • DRH 0.4 s ⇒ food if second response within 0.4\text{ s} of previous.

    • “Emit \ge 5 responses in 2\text{ s}”.

  • Uses: build rapid or fluent responding (e.g., typing speed).

3 DRO – Differential Reinforcement of Other Behaviour (Omissions)
  • Reinforcer delivered only if target response has not occurred for set time.

  • Example: DRO-30 s

    • If no screaming for 30 s in supermarket ⇒ promise ice-cream.

    • Target response resets timer.

  • Think of it as extinction + scheduled rewards for not doing behaviour.

  • Often implemented intermittently (not every interval, not every shopping trip).

Quick Comparison
  • DRL & DRH: target response must still occur (slow vs fast).

  • DRO: target response must be completely absent (omission training).

  • Collectively called pacing schedules: shape speed/spacing of behaviour.


Combining Schedules

1 Tandem Schedules
  • Two+ schedules in succession without external signals.

    • E.g. Tandem VI-60 → DRH 0.3 s.

    • After avg 60 s passes, reinforcement only delivered if a burst (IRT ≤ 0.3 s).

2 Chain Schedules (Signalled)
  • Successive schedules with unique discriminative stimuli (SDs) per link.

    • Completion of Link 1 (initial link) produces SD for Link 2, etc.

    • Final link (terminal link) leads to primary reinforcer.

  • Entry into next link functions as conditioned (secondary) reinforcer.

  • Practical value: teaching behavioural sequences (e.g., shoelace tying).

    • Example: dog-agility course, each obstacle cues the next.

3 Multiple (Mult) Schedules
  • Two+ signalled schedules alternate in a session; transition controlled by experimenter (not subject).

    • Each component ends after time or # reinforcers; separated by inter-component interval (ICI).

  • Mixed schedule = same as multiple but no SDs (unsignalled).


Concurrent Schedules (Choice)

  • Two+ independent schedules available simultaneously on separate manipulanda.

    • Subject freely allocates behaviour.

  • Example: Conc VI-10 s (left lever) vs VI-20 s (right).

    • Optimal strategy: concentrate more on richer VI-10 but periodically sample VI-20.

  • Real-life analogue: “work on assignment” vs “doom-scroll phone”.

  • Foundation for research on matching law, choice, impulsivity (covered next week).

Concurrent-Chain Schedules
  • Adds a choice phase (initial link) followed by exclusive outcome phase (terminal link).

    • Choose Left or Right key during initial link → locked in.

    • Terminal links may differ in schedule or reinforcer magnitude.

  • Lets researchers test pre-commitment, delay discounting, sub-optimal choice, etc.

    • E.g. choose between: smaller-sooner \bigl(FR\,5;\,1\text{ pellet}\bigr) vs larger-later \bigl(FR\,100;\,4\text{ pellets}\bigr).


Secondary (Conditioned) Reinforcement in Chains

  • Primary reinforcers: phylogenetically important (food, water, sex, warmth).

  • Secondary (conditioned) reinforcers: acquire value via pairing with primary (money, points, entry to next link).

  • In chains, the SD for Link N+1 can double as a conditioned reinforcer for completing Link N.


Examples & Metaphors Used in Lecture

  • Casino/Pokies VR example:

    • You play slots (VR schedule), text friend after each win.

    • Friend merely checks phone occasionally (VI schedule).

    • Demonstrates VR (player) vs VI (observer) under yoking logic.

  • Supermarket shouting → DRO contingency: “No shouting for 30 s = ice-cream”.

  • Collect supermarket stickers = DRH (collect X stickers before promo ends).

  • Dog-agility course & corgi video = chain schedule, each obstacle signals next.

  • Rats in lab: forthcoming concurrent-chain experiment on delay discounting (small-soon vs large-late food).


Numerical / Timing References

  • Yoking study schedules: \text{VR}{25}, \text{VI}{30\,\text{s}}.

  • DRH example: \text{IRT}\le 0.4\,\text{s}.

  • DRL example: \text{IRT}\ge 15\,\text{s}.

  • DRO example: no target response for 30\,\text{s}.

  • Tandem example: \text{VI}{60\,\text{s}} \rightarrow \text{DRH}{0.3\,\text{s}}.


Key Take-Home Points / Study Checklist

  • Understand yoking and why VR still > VI in response rate despite equal reinforcement.

  • Be able to state the response criteria for:

    • \text{DRL}, \text{DRH}, \text{DRO}.

  • Distinguish & diagram:

    • Tandem vs Chain (unsignalled vs signalled succession).

    • Multiple vs Mixed (signalled vs unsignalled alternation).

    • Concurrent vs Concurrent-Chain (simultaneous vs choice → outcome).

  • Know terms: initial link, terminal link, conditioned reinforcer.

  • Recognise practical applications (behaviour reduction, skill fluency, training sequences).

Good luck with online lab tasks (due 10\,\text{Aug}) and upcoming class test! Direct any questions to the Q-and-A forum or lecturer email.