Instrumental Conditioning – Comprehensive Study Notes

Thorndike’s Law of Effect

Human and non-human animals tend to repeat actions that result in favourable consequences.
Tendency to repeat actions is strengthened when followed by a favorable outcome (e.g., studying for tests leads to better marks).
Tendency to avoid or stop actions that result in punishment (e.g., skipping tutorials after receiving a Technical Fail).

Operant Conditioning (Instrumental Conditioning)

Operant conditioning: learning the associations between responses and consequences.
Also called instrumental conditioning.
In operant conditioning, the organism produces a response (voluntary, emitted rather than elicited).
The response is reinforced or punished (collectively known as outcomes).
You operate on the environment to get what you want and to avoid what you do not want.
Your action is instrumental in obtaining desired outcomes.

Positive and Negative (definitions)

Positive and negative do not refer to the valence (liking or disliking) of the event.
Positive means something is added.
Negative means something is removed.

Reinforcement and Punishment

Reinforcement: a reinforcer is an event following a response that strengthens the tendency to make that response (i.e., reinforces the behaviour).
“The only defining characteristic of a reinforcer is that it reinforces” (Skinner, 1953).
Punishment: a punisher is an event following a response that weakens the tendency to make that response (i.e., the behaviour is punished).

Putting it all together: how outcomes influence behaviour

Behavior increases with Reinforcement; behavior decreases with Punishment.
Add Positive + Positive Reinforcement: add a reward to increase behaviour.
Remove Negative - Negative Reinforcement: increase behaviour to remove a negative consequence.
Positive Punishment: add a negative consequence to decrease behaviour.
Negative Punishment: remove a positive consequence to decrease behaviour.

Types of Reinforcement and Punishment (with examples)

Positive Reinforcement: add a reward to increase a behaviour.
- Example: A child brushes teeth and receives a sticker; sticker added → increases future teeth brushing.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour
Positive Punishment: add a negative consequence to decrease a behaviour.
- Example: A student asks a question and is heavily criticized by the teacher; criticism added → decreases future questioning.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour
Negative Reinforcement: increase a behaviour to remove a negative outcome.
- Example: Acne treated with spot cream; acne reduces → increases future use of patch/cream.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour
Negative Punishment: remove a positive consequence to decrease a behaviour.
- Example: A child swears at parents; phone privileges removed → decreases swearing future.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour

Shaping

When an animal/human doesn’t perform the desired behaviour, you reinforce closer and closer approximations to the target behaviour.
This gradual reinforcement helps acquire complex behaviours.

Skinner Shaping: A Pigeon to Turn Around

Visual example from B.F. Skinner illustrating shaping with a pigeon learning to turn around through successive approximations and reinforcements.

Problems with Punishment

Learners may avoid/punisher-detection: may still perform undesirable behaviour when punisher is absent.
Punishment can inhibit all behaviour and reduce opportunity for learning alternative behaviours.
Could induce fear or dislike of punisher (Pavlovian conditioning).
Subject may copy punisher (observational learning).
If punishment is effective, the punisher may be rewarded by the reduction of the undesirable behaviour, which can reinforce violent behaviour.

Reinforcement Schedules (timing and frequency rules)

A rule determining the timing and frequency of reinforcements for a behaviour.
Key dimensions:
- Fixed vs Variable: Fixed means predictable; Variable means unpredictable.
- Ratio vs Interval:
- Ratio: based on number of responses.
- Interval: based on the passage of time.
- Continuous vs Intermittent:
- Continuous: every response is reinforced/punished.
- Intermittent: only a subset of responses are reinforced/punished.

Putting it all together: Schedules Grid (basic concepts)

Fixed vs Variable and Ratio vs Interval combine to determine response patterns and persistence.
Note: A visual schematic often included a 777 placeholder in the slides to illustrate the grid.

Rates of Responding with Different Reinforcement Schedules

The contingency between response and outcome greatly affects rate/persistence of responding.
Ratio schedules tend to elicit faster responding than interval schedules.
Animals learn reinforcer depends on number of responses rather than time.
Fixed schedules tend to produce pauses in responding (post-reinforcement pauses).
Variable schedules tend to produce steadier responding due to unpredictability.
The unpredictable nature of variable schedules leads to more consistent responding over time.

Fixed-Ratio (FR) – Rates of Responding

Reinforcement after a set number of responses.
Example: FR(5) — reinforcement after every 5th response.
Typically yields a high response rate with a post-reinforcement pause after each reinforcement.

Fixed-Interval (FI) – Rates of Responding

First response after a fixed time interval is reinforced.
Example: FI(30s) — first response after 30 seconds is reinforced.
Produces a scalloped response pattern: pauses, then increasing rate as the interval nears reinforcement.

Variable-Ratio (VR) – Rates of Responding

Reinforcement after an average number of responses.
Example: VR(n) — roughly every n responses reinforced, but varies unpredictably.
Produces a very high, steady rate of responding.

Variable-Interval (VI) – Rates of Responding

First response after an unpredictable time interval is reinforced.
Example: VI(t) — first response after an average of t seconds reinforced, but interval length varies Trial to Trial.
Produces a moderate, steady response rate.

Skinner Box

A laboratory apparatus used to study operant conditioning.
Examples:
- Positive Reinforcement: Rat presses lever to receive food.
- Negative Reinforcement: Rat presses lever to stop electric foot shock.
- Positive Punishment: Rat presses second lever and receives foot shock, discouraging pressing that lever.
- Negative Punishment: Rat stops pressing a lever to avoid foot shock, or to stop punishment (loss of an opportunity).

Drive Reduction Theory (Hull)

Motivation arises from biological needs that create drives.
Behaviour is aimed at reducing these drives, restoring homeostasis.
Key concepts:
- Needs: Biological requirements (e.g., food, water, warmth).
- Drives: Internal state of tension or arousal triggered by unmet needs (e.g., hunger, thirst).
- Drive Reduction: Behaviour is reinforced when it reduces a drive (e.g., eating reduces hunger).

Motivation and Learning: Contingencies and their role

Pavlovian conditioning involves learning the contingency between a biologically-relevant stimulus (e.g., food/pain) and a neutral stimulus (e.g., a bell).
- Implication: you can predict when something good or bad will happen.
Operant conditioning involves learning the contingency between enacting a behaviour (e.g., saying “please”) and a motivationally-relevant outcome (e.g., getting a cookie or not getting a cookie).
- Implication: you can control whether you get something good or bad.
Important question raised: do contingencies have to exist to learn? (e.g., everyday navigation: riding a bus and learning the layout of a city without explicit reinforcers.)

Tolman (1948) – Experiment on latent learning and cognitive maps

Design: Three groups of rats wandered a maze for 10 trials over 10 days.
- Group 1: Reinforced with food when reaching the end.
- Groups 2 and 3: No reinforcement.
Outcome measures: number of errors across training days (more errors = poorer learning).
Results:
- The food-rewarded group reduced errors more rapidly over time.
- On day 11, one of the previously non-rewarded groups received food.
- This group quickly learned to reach the end of the maze with few errors, whereas the continuously unrewarded group remained erratic.
Conclusion: latent learning occurred — learning that was not immediately observable became evident once a reward was introduced.
Cognitive Maps: Tolman proposed that rats formed internal cognitive maps of the maze.
Latent learning: learning that can occur without obvious reinforcement and becomes apparent only when there is a reason to demonstrate it.

Instrumental Conditioning Summary (Key Takeaways)

Instrumental (operant) conditioning involves making a voluntary response that leads to an outcome.
Different types are labelled by:
- Whether something is added or subtracted – Positive/Negative.
- Whether the behaviour/response increases or decreases – Reinforcement/Punishment.
Shaping can be used when animals need to learn complex or new behaviours.
Reinforcement schedules are rules guiding delivery of rewards/punishments, labelled by:
- Whether the rule is Fixed or Variable (consistency).
- Whether rewards depend on the number of responses (Ratio) or the passage of time (Interval).
Latent learning refers to learning that can occur without a clear motivator and may only become observable later when reinforced.
Real-world relevance and ethical considerations: use of reinforcement, punishment, and shaping in education, behavior modification, animal training, and clinical settings requires consideration of effects on motivation, stress, and avoidance behaviors.

Mathematical notations and schedule references

Fixed-Ratio: $FR(n)$ (reinforcement after every n responses).
Variable-Ratio: $VR(n)$ (reinforcement after an average of n responses, but unpredictable).
Fixed-Interval: $FI(t)$ (reinforcement after the first response following a fixed time interval; e.g., $FI(30\,s)$ ).
Variable-Interval: $VI(t)$ (reinforcement after the first response following an unpredictable time interval; e.g., $VI(30\,s)$ on average).
Post-reinforcement pause is a common feature in FR schedules.
The overarching idea is that the schedule type influences rate of responding, persistence, and pattern of responding over time.