WEEK 4 LEC OPERANT CONDITIONING & REINFORCEMENT
What Is Operant Conditioning? (Slides 7, 9–11)
Learning of a new association between a VOLUNTARY behaviour and its CONSEQUENCES.
Behaviour is modified (selected or discarded) according to consequences.
Learner actively “operates” on the environment to achieve goals.
Consequence types:
• Reward (pleasant outcome) ⇒ behaviour strengthened.
• Punishment or unpleasant outcome ⇒ behaviour weakened.OC focuses on goal-directed, voluntary actions rather than reflexes.
Historical Foundations
Edward Thorndike & the Law of Effect (Slides 12–15)
Puzzle-box experiments with cats:
• Cats placed in a box containing levers/strings; food outside.
• Measured escape time across trials.
• Observed progressive decrease in escape latency.Results:
• Ineffective responses (scratching, biting bars) decreased.
• Effective response (pull rope, press lever) increased.Law of Effect:
• Behaviours followed by satisfying outcomes become more likely.
• Behaviours producing no effect or discomfort become less likely.
B.F. Skinner (1904-1990) & the Skinner Box (Slides 16–17)
Expanded Thorndike’s ideas; father of modern Behaviourism.
Skinner Box apparatus:
• Lever/peck-key for subject (rat/pigeon).
• Food-pellet dispenser (positive reinforcer).
• Lights/speaker as discriminative stimuli.
• Electric grid for punishers if required.Allowed precise measurement of response rates and programmed contingencies.
The ABC Model – Three-Term Contingency (Slide 18)
Antecedent (A) ⇒ Behaviour (B) ⇒ Consequence (C).
• Example: “TURN” light on ⇒ pigeon turns ⇒ food pellet delivered.
• Example: Teacher question ⇒ student answers ⇒ praise.Emphasises the full chain: situation cues, response, outcome.
Operant vs Classical Conditioning (Slide 19)
OC: learning A–B–C relation; response is voluntary.
CC: learning S–S relation (CS–US); response is involuntary/reflexive.
CC can become part of OC when a conditioned stimulus serves as an antecedent cue within an operant contingency.
Acquisition, Extinction, Spontaneous Recovery (Slide 20)
Acquisition: period during which response strength grows because it is reinforced.
Extinction: .
Spontaneous Recovery: temporary re-appearance of an extinguished behaviour after a pause.
Graph (slide 20) shows classic rise, extinction, pause, spontaneous recovery, further extinction.
Behaviour Shaping (Slides 21–23)
Definition: reinforcing successive approximations toward a desired target behaviour.
Procedure:
• Identify baseline behaviour.
• Reinforce any response vaguely resembling target.
• Gradually withhold reinforcement until closer approximation emitted, then reinforce.
• Continue narrowing until only target behaviour earns reward.Applications/Examples:
• Circus elephant balancing, sea-lion tricks, child cleaning room, lion using toilet (cartoon example).
• Enables creation of entirely new behaviours not currently in repertoire.
Consequences of Behaviour (Slides 23–26)
Two broad functions:
• Reinforcement: increases probability ().
• Punishment: decreases probability ().
Positive vs Negative (Slide 25)
Positive ( + ): ADDING a stimulus.
Negative ( – ): REMOVING a stimulus.
Matrix:
• Positive Reinforcement – add pleasant stimulus.
• Negative Reinforcement – remove unpleasant stimulus.
• Positive Punishment – add unpleasant stimulus (not emphasised in slides but implicit).
• Negative Punishment – remove pleasant stimulus.
Reinforcement Identification Questions (Slide 26)
What behaviour is strengthened?
Was a stimulus added or removed?
Was the stimulus pleasant or unpleasant?
Therefore classify: positive or negative reinforcement.
Worked Examples
Gold star for packing toys (Slide 27)
• Behaviour: packing toys.
• Added pleasant star ⇒ Positive Reinforcement.Headache relieved after painkiller (Slide 28)
• Behaviour: taking painkiller.
• Removed unpleasant headache ⇒ Negative Reinforcement.Martha’s cooperative play (Slide 30)
• Behaviour: playing with peers.
• Added praise (pleasant) ⇒ Positive Reinforcement.Silent class – teacher answers own question (Slide 31)
• Behaviour: students’ silence.
• Teacher removes demand for answer (removal of aversive attention) ⇒ Negative Reinforcement for being quiet.Seat-belt beep (Slide 32)
• Behaviour: fastening seat-belt.
• Removes aversive beeping noise ⇒ Negative Reinforcement.
Key reminder: the controlling stimulus (gold stars, headache, beeping) may originate externally or internally, but the behaviour is voluntary.
Primary vs Secondary Reinforcers (Slide 33)
Primary Reinforcer: innately satisfying (food, drink, sex); or removal of innate aversive (shock, pain).
Secondary (Conditioned) Reinforcer: gains value via association with primary (money, tokens, grades, praise).
• Typically established through Classical Conditioning where the secondary stimulus predicts a primary reinforcer.
Stimulus Generalisation & Discrimination (Slide 34)
Generalisation: after reinforcement, organism attempts similar behaviours or emits behaviour in new contexts.
• Promotes adaptive exploration.Discriminative Stimulus (): signals that a particular consequence is available for a specific behaviour.
• Traffic-light metaphor: green light cues “drive”, red light cues “stop”.Discrimination Learning: organism learns to respond differently under different antecedent conditions.
Determinants of Conditioning (Slide 35)
Timing (Contiguity)
• Shorter delay between behaviour and consequence ⇒ stronger learning.Predictability (Contingency)
• Consistent pairing of behaviour and outcome strengthens association.
• Captured quantitatively by conditional probability: and .Magnitude
• Larger rewards/punishers usually exert greater influence, though subject to diminishing returns.
Schedules of Reinforcement (Slides 36–39)
Continuous Reinforcement (CRF): every correct response reinforced.
• Optimal for rapid acquisition.Partial/Intermittent Schedules: reinforcement delivered only some of the time.
• Ratio vs Interval; Fixed vs Variable (details in reading p. 256-258).
• Produce greater resistance to extinction.Extinction: occurs when reinforcement is withheld long enough.
Practical training advice (Slide 36):
• Start with CRF while teaching (“sit” for dog).
• Switch to intermittent schedule once behaviour mastered.
• Eventually fade to minimal reinforcement to maintain behaviour economically.
Ethical & Practical Implications
Effective behaviour change (education, parenting, animal training, clinical settings).
Ethical considerations:
• Use of punishment vs reinforcement; potential welfare issues with aversive stimuli (shocks in Skinner box).
• Informed consent when applying OC principles to humans.Cultural acknowledgement underscores responsible, inclusive practice.
Connections & Real-World Relevance
Links to Classical Conditioning (secondary reinforcers form via CC).
Thorndike’s and Skinner’s foundational work influences modern Applied Behaviour Analysis (ABA), token economies, and operant-based therapies.
Marketing, workplace incentives, game design all utilise reinforcement schedules.
Numerical / Statistical References
Thorndike graph: escape time (seconds) plotted against number of trials – shows negatively accelerating curve.
Response-probability notation examples:
• P(B|R) > P(B|\text{no R}) where R = reinforcement.
• Extinction criterion: under zero reinforcement.