WEEK 4 LEC OPERANT CONDITIONING & REINFORCEMENT

What Is Operant Conditioning? (Slides 7, 9–11)

  • Learning of a new association between a VOLUNTARY behaviour and its CONSEQUENCES.

  • Behaviour is modified (selected or discarded) according to consequences.

  • Learner actively “operates” on the environment to achieve goals.

  • Consequence types:
    • Reward (pleasant outcome) ⇒ behaviour strengthened.
    • Punishment or unpleasant outcome ⇒ behaviour weakened.

  • OC focuses on goal-directed, voluntary actions rather than reflexes.

Historical Foundations

Edward Thorndike & the Law of Effect (Slides 12–15)

  • Puzzle-box experiments with cats:
    • Cats placed in a box containing levers/strings; food outside.
    • Measured escape time across trials.
    • Observed progressive decrease in escape latency.

  • Results:
    • Ineffective responses (scratching, biting bars) decreased.
    • Effective response (pull rope, press lever) increased.

  • Law of Effect:
    • Behaviours followed by satisfying outcomes become more likely.
    • Behaviours producing no effect or discomfort become less likely.

B.F. Skinner (1904-1990) & the Skinner Box (Slides 16–17)

  • Expanded Thorndike’s ideas; father of modern Behaviourism.

  • Skinner Box apparatus:
    • Lever/peck-key for subject (rat/pigeon).
    • Food-pellet dispenser (positive reinforcer).
    • Lights/speaker as discriminative stimuli.
    • Electric grid for punishers if required.

  • Allowed precise measurement of response rates and programmed contingencies.

The ABC Model – Three-Term Contingency (Slide 18)

  • Antecedent (A) ⇒ Behaviour (B) ⇒ Consequence (C).
    • Example: “TURN” light on ⇒ pigeon turns ⇒ food pellet delivered.
    • Example: Teacher question ⇒ student answers ⇒ praise.

  • Emphasises the full chain: situation cues, response, outcome.

Operant vs Classical Conditioning (Slide 19)

  • OC: learning A–B–C relation; response is voluntary.

  • CC: learning S–S relation (CS–US); response is involuntary/reflexive.

  • CC can become part of OC when a conditioned stimulus serves as an antecedent cue within an operant contingency.

Acquisition, Extinction, Spontaneous Recovery (Slide 20)

  • Acquisition: period during which response strength grows because it is reinforced.

  • Extinction: Behaviourno reinforcementDecrease in response strength\text{Behaviour} \xrightarrow{\text{no reinforcement}} \text{Decrease in response strength}.

  • Spontaneous Recovery: temporary re-appearance of an extinguished behaviour after a pause.

  • Graph (slide 20) shows classic rise, extinction, pause, spontaneous recovery, further extinction.

Behaviour Shaping (Slides 21–23)

  • Definition: reinforcing successive approximations toward a desired target behaviour.

  • Procedure:
    • Identify baseline behaviour.
    • Reinforce any response vaguely resembling target.
    • Gradually withhold reinforcement until closer approximation emitted, then reinforce.
    • Continue narrowing until only target behaviour earns reward.

  • Applications/Examples:
    • Circus elephant balancing, sea-lion tricks, child cleaning room, lion using toilet (cartoon example).
    • Enables creation of entirely new behaviours not currently in repertoire.

Consequences of Behaviour (Slides 23–26)

  • Two broad functions:
    • Reinforcement: increases probability (P(B)P(B) \uparrow).
    • Punishment: decreases probability (P(B)P(B) \downarrow).

Positive vs Negative (Slide 25)

  • Positive ( + ): ADDING a stimulus.

  • Negative ( – ): REMOVING a stimulus.

  • Matrix:
    • Positive Reinforcement – add pleasant stimulus.
    • Negative Reinforcement – remove unpleasant stimulus.
    • Positive Punishment – add unpleasant stimulus (not emphasised in slides but implicit).
    • Negative Punishment – remove pleasant stimulus.

Reinforcement Identification Questions (Slide 26)

  • What behaviour is strengthened?

  • Was a stimulus added or removed?

  • Was the stimulus pleasant or unpleasant?

  • Therefore classify: positive or negative reinforcement.

Worked Examples
  1. Gold star for packing toys (Slide 27)
    • Behaviour: packing toys.
    • Added pleasant star ⇒ Positive Reinforcement.

  2. Headache relieved after painkiller (Slide 28)
    • Behaviour: taking painkiller.
    • Removed unpleasant headache ⇒ Negative Reinforcement.

  3. Martha’s cooperative play (Slide 30)
    • Behaviour: playing with peers.
    • Added praise (pleasant) ⇒ Positive Reinforcement.

  4. Silent class – teacher answers own question (Slide 31)
    • Behaviour: students’ silence.
    • Teacher removes demand for answer (removal of aversive attention) ⇒ Negative Reinforcement for being quiet.

  5. Seat-belt beep (Slide 32)
    • Behaviour: fastening seat-belt.
    • Removes aversive beeping noise ⇒ Negative Reinforcement.

  • Key reminder: the controlling stimulus (gold stars, headache, beeping) may originate externally or internally, but the behaviour is voluntary.

Primary vs Secondary Reinforcers (Slide 33)

  • Primary Reinforcer: innately satisfying (food, drink, sex); or removal of innate aversive (shock, pain).

  • Secondary (Conditioned) Reinforcer: gains value via association with primary (money, tokens, grades, praise).
    • Typically established through Classical Conditioning where the secondary stimulus predicts a primary reinforcer.

Stimulus Generalisation & Discrimination (Slide 34)

  • Generalisation: after reinforcement, organism attempts similar behaviours or emits behaviour in new contexts.
    • Promotes adaptive exploration.

  • Discriminative Stimulus (SDS^D): signals that a particular consequence is available for a specific behaviour.
    • Traffic-light metaphor: green light cues “drive”, red light cues “stop”.

  • Discrimination Learning: organism learns to respond differently under different antecedent conditions.

Determinants of Conditioning (Slide 35)

  1. Timing (Contiguity)
    • Shorter delay between behaviour and consequence ⇒ stronger learning.

  2. Predictability (Contingency)
    • Consistent pairing of behaviour and outcome strengthens association.
    • Captured quantitatively by conditional probability: P(CB)P(C|B) and P(C¬B)P(C|\lnot B).

  3. Magnitude
    • Larger rewards/punishers usually exert greater influence, though subject to diminishing returns.

Schedules of Reinforcement (Slides 36–39)

  • Continuous Reinforcement (CRF): every correct response reinforced.
    • Optimal for rapid acquisition.

  • Partial/Intermittent Schedules: reinforcement delivered only some of the time.
    • Ratio vs Interval; Fixed vs Variable (details in reading p. 256-258).
    • Produce greater resistance to extinction.

  • Extinction: occurs when reinforcement is withheld long enough.

  • Practical training advice (Slide 36):
    • Start with CRF while teaching (“sit” for dog).
    • Switch to intermittent schedule once behaviour mastered.
    • Eventually fade to minimal reinforcement to maintain behaviour economically.

Ethical & Practical Implications

  • Effective behaviour change (education, parenting, animal training, clinical settings).

  • Ethical considerations:
    • Use of punishment vs reinforcement; potential welfare issues with aversive stimuli (shocks in Skinner box).
    • Informed consent when applying OC principles to humans.

  • Cultural acknowledgement underscores responsible, inclusive practice.

Connections & Real-World Relevance

  • Links to Classical Conditioning (secondary reinforcers form via CC).

  • Thorndike’s and Skinner’s foundational work influences modern Applied Behaviour Analysis (ABA), token economies, and operant-based therapies.

  • Marketing, workplace incentives, game design all utilise reinforcement schedules.

Numerical / Statistical References

  • Thorndike graph: escape time (seconds) plotted against number of trials – shows negatively accelerating curve.

  • Response-probability notation examples:
    • P(B|R) > P(B|\text{no R}) where R = reinforcement.
    • Extinction criterion: lim<em>nP(B</em>n)=0\lim<em>{n \to \infty} P(B</em>n) = 0 under zero reinforcement.