WEEK 4 LEC OPERANT CONDITIONING & REINFORCEMENT

Learning of a new association between a VOLUNTARY behaviour and its CONSEQUENCES.
Behaviour is modified (selected or discarded) according to consequences.
Learner actively “operates” on the environment to achieve goals.
Consequence types:
• Reward (pleasant outcome) ⇒ behaviour strengthened.
• Punishment or unpleasant outcome ⇒ behaviour weakened.
OC focuses on goal-directed, voluntary actions rather than reflexes.

Puzzle-box experiments with cats:
• Cats placed in a box containing levers/strings; food outside.
• Measured escape time across trials.
• Observed progressive decrease in escape latency.
Results:
• Ineffective responses (scratching, biting bars) decreased.
• Effective response (pull rope, press lever) increased.
Law of Effect:
• Behaviours followed by satisfying outcomes become more likely.
• Behaviours producing no effect or discomfort become less likely.

Expanded Thorndike’s ideas; father of modern Behaviourism.
Skinner Box apparatus:
• Lever/peck-key for subject (rat/pigeon).
• Food-pellet dispenser (positive reinforcer).
• Lights/speaker as discriminative stimuli.
• Electric grid for punishers if required.
Allowed precise measurement of response rates and programmed contingencies.

Antecedent (A) ⇒ Behaviour (B) ⇒ Consequence (C).
• Example: “TURN” light on ⇒ pigeon turns ⇒ food pellet delivered.
• Example: Teacher question ⇒ student answers ⇒ praise.
Emphasises the full chain: situation cues, response, outcome.

OC: learning A–B–C relation; response is voluntary.
CC: learning S–S relation (CS–US); response is involuntary/reflexive.
CC can become part of OC when a conditioned stimulus serves as an antecedent cue within an operant contingency.

Acquisition: period during which response strength grows because it is reinforced.
Extinction: $\text{Behaviour} \xrightarrow{\text{no reinforcement}} \text{Decrease in response strength}$ .
Spontaneous Recovery: temporary re-appearance of an extinguished behaviour after a pause.
Graph (slide 20) shows classic rise, extinction, pause, spontaneous recovery, further extinction.

Definition: reinforcing successive approximations toward a desired target behaviour.
Procedure:
• Identify baseline behaviour.
• Reinforce any response vaguely resembling target.
• Gradually withhold reinforcement until closer approximation emitted, then reinforce.
• Continue narrowing until only target behaviour earns reward.
Applications/Examples:
• Circus elephant balancing, sea-lion tricks, child cleaning room, lion using toilet (cartoon example).
• Enables creation of entirely new behaviours not currently in repertoire.

Two broad functions:
• Reinforcement: increases probability ( $P(B) \uparrow$ ).
• Punishment: decreases probability ( $P(B) \downarrow$ ).

Positive ( + ): ADDING a stimulus.
Negative ( – ): REMOVING a stimulus.
Matrix:
• Positive Reinforcement – add pleasant stimulus.
• Negative Reinforcement – remove unpleasant stimulus.
• Positive Punishment – add unpleasant stimulus (not emphasised in slides but implicit).
• Negative Punishment – remove pleasant stimulus.

Gold star for packing toys (Slide 27)
• Behaviour: packing toys.
• Added pleasant star ⇒ Positive Reinforcement.
Headache relieved after painkiller (Slide 28)
• Behaviour: taking painkiller.
• Removed unpleasant headache ⇒ Negative Reinforcement.
Martha’s cooperative play (Slide 30)
• Behaviour: playing with peers.
• Added praise (pleasant) ⇒ Positive Reinforcement.
Silent class – teacher answers own question (Slide 31)
• Behaviour: students’ silence.
• Teacher removes demand for answer (removal of aversive attention) ⇒ Negative Reinforcement for being quiet.
Seat-belt beep (Slide 32)
• Behaviour: fastening seat-belt.
• Removes aversive beeping noise ⇒ Negative Reinforcement.

Key reminder: the controlling stimulus (gold stars, headache, beeping) may originate externally or internally, but the behaviour is voluntary.

Primary Reinforcer: innately satisfying (food, drink, sex); or removal of innate aversive (shock, pain).
Secondary (Conditioned) Reinforcer: gains value via association with primary (money, tokens, grades, praise).
• Typically established through Classical Conditioning where the secondary stimulus predicts a primary reinforcer.

Generalisation: after reinforcement, organism attempts similar behaviours or emits behaviour in new contexts.
• Promotes adaptive exploration.
Discriminative Stimulus ( $S^D$ ): signals that a particular consequence is available for a specific behaviour.
• Traffic-light metaphor: green light cues “drive”, red light cues “stop”.
Discrimination Learning: organism learns to respond differently under different antecedent conditions.

Timing (Contiguity)
• Shorter delay between behaviour and consequence ⇒ stronger learning.
Predictability (Contingency)
• Consistent pairing of behaviour and outcome strengthens association.
• Captured quantitatively by conditional probability: $P(C|B)$ and $P(C|\lnot B)$ .
Magnitude
• Larger rewards/punishers usually exert greater influence, though subject to diminishing returns.

Continuous Reinforcement (CRF): every correct response reinforced.
• Optimal for rapid acquisition.
Partial/Intermittent Schedules: reinforcement delivered only some of the time.
• Ratio vs Interval; Fixed vs Variable (details in reading p. 256-258).
• Produce greater resistance to extinction.
Extinction: occurs when reinforcement is withheld long enough.
Practical training advice (Slide 36):
• Start with CRF while teaching (“sit” for dog).
• Switch to intermittent schedule once behaviour mastered.
• Eventually fade to minimal reinforcement to maintain behaviour economically.

Effective behaviour change (education, parenting, animal training, clinical settings).
Ethical considerations:
• Use of punishment vs reinforcement; potential welfare issues with aversive stimuli (shocks in Skinner box).
• Informed consent when applying OC principles to humans.
Cultural acknowledgement underscores responsible, inclusive practice.

Links to Classical Conditioning (secondary reinforcers form via CC).
Thorndike’s and Skinner’s foundational work influences modern Applied Behaviour Analysis (ABA), token economies, and operant-based therapies.
Marketing, workplace incentives, game design all utilise reinforcement schedules.

Thorndike graph: escape time (seconds) plotted against number of trials – shows negatively accelerating curve.
Response-probability notation examples:
• P(B|R) > P(B|\text{no R}) where R = reinforcement.
• Extinction criterion: $\lim<em>{n \to \infty} P(B</em>n) = 0$ under zero reinforcement.