Instrumental Conditioning – Comprehensive Study Notes
Thorndike’s Law of Effect
- Human and non-human animals tend to repeat actions that result in favourable consequences.
- Tendency to repeat actions is strengthened when followed by a favorable outcome (e.g., studying for tests leads to better marks).
- Tendency to avoid or stop actions that result in punishment (e.g., skipping tutorials after receiving a Technical Fail).
Operant Conditioning (Instrumental Conditioning)
- Operant conditioning: learning the associations between responses and consequences.
- Also called instrumental conditioning.
- In operant conditioning, the organism produces a response (voluntary, emitted rather than elicited).
- The response is reinforced or punished (collectively known as outcomes).
- You operate on the environment to get what you want and to avoid what you do not want.
- Your action is instrumental in obtaining desired outcomes.
Positive and Negative (definitions)
- Positive and negative do not refer to the valence (liking or disliking) of the event.
- Positive means something is added.
- Negative means something is removed.
Reinforcement and Punishment
- Reinforcement: a reinforcer is an event following a response that strengthens the tendency to make that response (i.e., reinforces the behaviour).
- “The only defining characteristic of a reinforcer is that it reinforces” (Skinner, 1953).
- Punishment: a punisher is an event following a response that weakens the tendency to make that response (i.e., the behaviour is punished).
Putting it all together: how outcomes influence behaviour
- Behavior increases with Reinforcement; behavior decreases with Punishment.
- Add Positive + Positive Reinforcement: add a reward to increase behaviour.
- Remove Negative - Negative Reinforcement: increase behaviour to remove a negative consequence.
- Positive Punishment: add a negative consequence to decrease behaviour.
- Negative Punishment: remove a positive consequence to decrease behaviour.
Types of Reinforcement and Punishment (with examples)
- Positive Reinforcement: add a reward to increase a behaviour.
- Example: A child brushes teeth and receives a sticker; sticker added → increases future teeth brushing.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour
- Positive Punishment: add a negative consequence to decrease a behaviour.
- Example: A student asks a question and is heavily criticized by the teacher; criticism added → decreases future questioning.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour
- Negative Reinforcement: increase a behaviour to remove a negative outcome.
- Example: Acne treated with spot cream; acne reduces → increases future use of patch/cream.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour
- Negative Punishment: remove a positive consequence to decrease a behaviour.
- Example: A child swears at parents; phone privileges removed → decreases swearing future.
- Diagram: Behaviour/Response → Outcome → Effect on Behaviour
Shaping
- When an animal/human doesn’t perform the desired behaviour, you reinforce closer and closer approximations to the target behaviour.
- This gradual reinforcement helps acquire complex behaviours.
Skinner Shaping: A Pigeon to Turn Around
- Visual example from B.F. Skinner illustrating shaping with a pigeon learning to turn around through successive approximations and reinforcements.
Problems with Punishment
- Learners may avoid/punisher-detection: may still perform undesirable behaviour when punisher is absent.
- Punishment can inhibit all behaviour and reduce opportunity for learning alternative behaviours.
- Could induce fear or dislike of punisher (Pavlovian conditioning).
- Subject may copy punisher (observational learning).
- If punishment is effective, the punisher may be rewarded by the reduction of the undesirable behaviour, which can reinforce violent behaviour.
Reinforcement Schedules (timing and frequency rules)
- A rule determining the timing and frequency of reinforcements for a behaviour.
- Key dimensions:
- Fixed vs Variable: Fixed means predictable; Variable means unpredictable.
- Ratio vs Interval:
- Ratio: based on number of responses.
- Interval: based on the passage of time.
- Continuous vs Intermittent:
- Continuous: every response is reinforced/punished.
- Intermittent: only a subset of responses are reinforced/punished.
Putting it all together: Schedules Grid (basic concepts)
- Fixed vs Variable and Ratio vs Interval combine to determine response patterns and persistence.
- Note: A visual schematic often included a 777 placeholder in the slides to illustrate the grid.
Rates of Responding with Different Reinforcement Schedules
- The contingency between response and outcome greatly affects rate/persistence of responding.
- Ratio schedules tend to elicit faster responding than interval schedules.
- Animals learn reinforcer depends on number of responses rather than time.
- Fixed schedules tend to produce pauses in responding (post-reinforcement pauses).
- Variable schedules tend to produce steadier responding due to unpredictability.
- The unpredictable nature of variable schedules leads to more consistent responding over time.
Fixed-Ratio (FR) – Rates of Responding
- Reinforcement after a set number of responses.
- Example: FR(5) — reinforcement after every 5th response.
- Typically yields a high response rate with a post-reinforcement pause after each reinforcement.
Fixed-Interval (FI) – Rates of Responding
- First response after a fixed time interval is reinforced.
- Example: FI(30s) — first response after 30 seconds is reinforced.
- Produces a scalloped response pattern: pauses, then increasing rate as the interval nears reinforcement.
Variable-Ratio (VR) – Rates of Responding
- Reinforcement after an average number of responses.
- Example: VR(n) — roughly every n responses reinforced, but varies unpredictably.
- Produces a very high, steady rate of responding.
Variable-Interval (VI) – Rates of Responding
- First response after an unpredictable time interval is reinforced.
- Example: VI(t) — first response after an average of t seconds reinforced, but interval length varies Trial to Trial.
- Produces a moderate, steady response rate.
Skinner Box
- A laboratory apparatus used to study operant conditioning.
- Examples:
- Positive Reinforcement: Rat presses lever to receive food.
- Negative Reinforcement: Rat presses lever to stop electric foot shock.
- Positive Punishment: Rat presses second lever and receives foot shock, discouraging pressing that lever.
- Negative Punishment: Rat stops pressing a lever to avoid foot shock, or to stop punishment (loss of an opportunity).
Drive Reduction Theory (Hull)
- Motivation arises from biological needs that create drives.
- Behaviour is aimed at reducing these drives, restoring homeostasis.
- Key concepts:
- Needs: Biological requirements (e.g., food, water, warmth).
- Drives: Internal state of tension or arousal triggered by unmet needs (e.g., hunger, thirst).
- Drive Reduction: Behaviour is reinforced when it reduces a drive (e.g., eating reduces hunger).
Motivation and Learning: Contingencies and their role
- Pavlovian conditioning involves learning the contingency between a biologically-relevant stimulus (e.g., food/pain) and a neutral stimulus (e.g., a bell).
- Implication: you can predict when something good or bad will happen.
- Operant conditioning involves learning the contingency between enacting a behaviour (e.g., saying “please”) and a motivationally-relevant outcome (e.g., getting a cookie or not getting a cookie).
- Implication: you can control whether you get something good or bad.
- Important question raised: do contingencies have to exist to learn? (e.g., everyday navigation: riding a bus and learning the layout of a city without explicit reinforcers.)
Tolman (1948) – Experiment on latent learning and cognitive maps
- Design: Three groups of rats wandered a maze for 10 trials over 10 days.
- Group 1: Reinforced with food when reaching the end.
- Groups 2 and 3: No reinforcement.
- Outcome measures: number of errors across training days (more errors = poorer learning).
- Results:
- The food-rewarded group reduced errors more rapidly over time.
- On day 11, one of the previously non-rewarded groups received food.
- This group quickly learned to reach the end of the maze with few errors, whereas the continuously unrewarded group remained erratic.
- Conclusion: latent learning occurred — learning that was not immediately observable became evident once a reward was introduced.
- Cognitive Maps: Tolman proposed that rats formed internal cognitive maps of the maze.
- Latent learning: learning that can occur without obvious reinforcement and becomes apparent only when there is a reason to demonstrate it.
Instrumental Conditioning Summary (Key Takeaways)
- Instrumental (operant) conditioning involves making a voluntary response that leads to an outcome.
- Different types are labelled by:
- Whether something is added or subtracted – Positive/Negative.
- Whether the behaviour/response increases or decreases – Reinforcement/Punishment.
- Shaping can be used when animals need to learn complex or new behaviours.
- Reinforcement schedules are rules guiding delivery of rewards/punishments, labelled by:
- Whether the rule is Fixed or Variable (consistency).
- Whether rewards depend on the number of responses (Ratio) or the passage of time (Interval).
- Latent learning refers to learning that can occur without a clear motivator and may only become observable later when reinforced.
- Real-world relevance and ethical considerations: use of reinforcement, punishment, and shaping in education, behavior modification, animal training, and clinical settings requires consideration of effects on motivation, stress, and avoidance behaviors.
Mathematical notations and schedule references
- Fixed-Ratio: FR(n) (reinforcement after every n responses).
- Variable-Ratio: VR(n) (reinforcement after an average of n responses, but unpredictable).
- Fixed-Interval: FI(t) (reinforcement after the first response following a fixed time interval; e.g., FI(30s)).
- Variable-Interval: VI(t) (reinforcement after the first response following an unpredictable time interval; e.g., VI(30s) on average).
- Post-reinforcement pause is a common feature in FR schedules.
- The overarching idea is that the schedule type influences rate of responding, persistence, and pattern of responding over time.