Instrumental Conditioning – Comprehensive Study Notes

Thorndike’s Law of Effect

  • Human and non-human animals tend to repeat actions that result in favourable consequences.
  • Tendency to repeat actions is strengthened when followed by a favorable outcome (e.g., studying for tests leads to better marks).
  • Tendency to avoid or stop actions that result in punishment (e.g., skipping tutorials after receiving a Technical Fail).

Operant Conditioning (Instrumental Conditioning)

  • Operant conditioning: learning the associations between responses and consequences.
  • Also called instrumental conditioning.
  • In operant conditioning, the organism produces a response (voluntary, emitted rather than elicited).
  • The response is reinforced or punished (collectively known as outcomes).
  • You operate on the environment to get what you want and to avoid what you do not want.
  • Your action is instrumental in obtaining desired outcomes.

Positive and Negative (definitions)

  • Positive and negative do not refer to the valence (liking or disliking) of the event.
  • Positive means something is added.
  • Negative means something is removed.

Reinforcement and Punishment

  • Reinforcement: a reinforcer is an event following a response that strengthens the tendency to make that response (i.e., reinforces the behaviour).
  • “The only defining characteristic of a reinforcer is that it reinforces” (Skinner, 1953).
  • Punishment: a punisher is an event following a response that weakens the tendency to make that response (i.e., the behaviour is punished).

Putting it all together: how outcomes influence behaviour

  • Behavior increases with Reinforcement; behavior decreases with Punishment.
  • Add Positive + Positive Reinforcement: add a reward to increase behaviour.
  • Remove Negative - Negative Reinforcement: increase behaviour to remove a negative consequence.
  • Positive Punishment: add a negative consequence to decrease behaviour.
  • Negative Punishment: remove a positive consequence to decrease behaviour.

Types of Reinforcement and Punishment (with examples)

  • Positive Reinforcement: add a reward to increase a behaviour.
    • Example: A child brushes teeth and receives a sticker; sticker added → increases future teeth brushing.
    • Diagram: Behaviour/Response → Outcome → Effect on Behaviour
  • Positive Punishment: add a negative consequence to decrease a behaviour.
    • Example: A student asks a question and is heavily criticized by the teacher; criticism added → decreases future questioning.
    • Diagram: Behaviour/Response → Outcome → Effect on Behaviour
  • Negative Reinforcement: increase a behaviour to remove a negative outcome.
    • Example: Acne treated with spot cream; acne reduces → increases future use of patch/cream.
    • Diagram: Behaviour/Response → Outcome → Effect on Behaviour
  • Negative Punishment: remove a positive consequence to decrease a behaviour.
    • Example: A child swears at parents; phone privileges removed → decreases swearing future.
    • Diagram: Behaviour/Response → Outcome → Effect on Behaviour

Shaping

  • When an animal/human doesn’t perform the desired behaviour, you reinforce closer and closer approximations to the target behaviour.
  • This gradual reinforcement helps acquire complex behaviours.

Skinner Shaping: A Pigeon to Turn Around

  • Visual example from B.F. Skinner illustrating shaping with a pigeon learning to turn around through successive approximations and reinforcements.

Problems with Punishment

  • Learners may avoid/punisher-detection: may still perform undesirable behaviour when punisher is absent.
  • Punishment can inhibit all behaviour and reduce opportunity for learning alternative behaviours.
  • Could induce fear or dislike of punisher (Pavlovian conditioning).
  • Subject may copy punisher (observational learning).
  • If punishment is effective, the punisher may be rewarded by the reduction of the undesirable behaviour, which can reinforce violent behaviour.

Reinforcement Schedules (timing and frequency rules)

  • A rule determining the timing and frequency of reinforcements for a behaviour.
  • Key dimensions:
    • Fixed vs Variable: Fixed means predictable; Variable means unpredictable.
    • Ratio vs Interval:
    • Ratio: based on number of responses.
    • Interval: based on the passage of time.
    • Continuous vs Intermittent:
    • Continuous: every response is reinforced/punished.
    • Intermittent: only a subset of responses are reinforced/punished.

Putting it all together: Schedules Grid (basic concepts)

  • Fixed vs Variable and Ratio vs Interval combine to determine response patterns and persistence.
  • Note: A visual schematic often included a 777 placeholder in the slides to illustrate the grid.

Rates of Responding with Different Reinforcement Schedules

  • The contingency between response and outcome greatly affects rate/persistence of responding.
  • Ratio schedules tend to elicit faster responding than interval schedules.
  • Animals learn reinforcer depends on number of responses rather than time.
  • Fixed schedules tend to produce pauses in responding (post-reinforcement pauses).
  • Variable schedules tend to produce steadier responding due to unpredictability.
  • The unpredictable nature of variable schedules leads to more consistent responding over time.

Fixed-Ratio (FR) – Rates of Responding

  • Reinforcement after a set number of responses.
  • Example: FR(5) — reinforcement after every 5th response.
  • Typically yields a high response rate with a post-reinforcement pause after each reinforcement.

Fixed-Interval (FI) – Rates of Responding

  • First response after a fixed time interval is reinforced.
  • Example: FI(30s) — first response after 30 seconds is reinforced.
  • Produces a scalloped response pattern: pauses, then increasing rate as the interval nears reinforcement.

Variable-Ratio (VR) – Rates of Responding

  • Reinforcement after an average number of responses.
  • Example: VR(n) — roughly every n responses reinforced, but varies unpredictably.
  • Produces a very high, steady rate of responding.

Variable-Interval (VI) – Rates of Responding

  • First response after an unpredictable time interval is reinforced.
  • Example: VI(t) — first response after an average of t seconds reinforced, but interval length varies Trial to Trial.
  • Produces a moderate, steady response rate.

Skinner Box

  • A laboratory apparatus used to study operant conditioning.
  • Examples:
    • Positive Reinforcement: Rat presses lever to receive food.
    • Negative Reinforcement: Rat presses lever to stop electric foot shock.
    • Positive Punishment: Rat presses second lever and receives foot shock, discouraging pressing that lever.
    • Negative Punishment: Rat stops pressing a lever to avoid foot shock, or to stop punishment (loss of an opportunity).

Drive Reduction Theory (Hull)

  • Motivation arises from biological needs that create drives.
  • Behaviour is aimed at reducing these drives, restoring homeostasis.
  • Key concepts:
    • Needs: Biological requirements (e.g., food, water, warmth).
    • Drives: Internal state of tension or arousal triggered by unmet needs (e.g., hunger, thirst).
    • Drive Reduction: Behaviour is reinforced when it reduces a drive (e.g., eating reduces hunger).

Motivation and Learning: Contingencies and their role

  • Pavlovian conditioning involves learning the contingency between a biologically-relevant stimulus (e.g., food/pain) and a neutral stimulus (e.g., a bell).
    • Implication: you can predict when something good or bad will happen.
  • Operant conditioning involves learning the contingency between enacting a behaviour (e.g., saying “please”) and a motivationally-relevant outcome (e.g., getting a cookie or not getting a cookie).
    • Implication: you can control whether you get something good or bad.
  • Important question raised: do contingencies have to exist to learn? (e.g., everyday navigation: riding a bus and learning the layout of a city without explicit reinforcers.)

Tolman (1948) – Experiment on latent learning and cognitive maps

  • Design: Three groups of rats wandered a maze for 10 trials over 10 days.
    • Group 1: Reinforced with food when reaching the end.
    • Groups 2 and 3: No reinforcement.
  • Outcome measures: number of errors across training days (more errors = poorer learning).
  • Results:
    • The food-rewarded group reduced errors more rapidly over time.
    • On day 11, one of the previously non-rewarded groups received food.
    • This group quickly learned to reach the end of the maze with few errors, whereas the continuously unrewarded group remained erratic.
  • Conclusion: latent learning occurred — learning that was not immediately observable became evident once a reward was introduced.
  • Cognitive Maps: Tolman proposed that rats formed internal cognitive maps of the maze.
  • Latent learning: learning that can occur without obvious reinforcement and becomes apparent only when there is a reason to demonstrate it.

Instrumental Conditioning Summary (Key Takeaways)

  • Instrumental (operant) conditioning involves making a voluntary response that leads to an outcome.
  • Different types are labelled by:
    • Whether something is added or subtracted – Positive/Negative.
    • Whether the behaviour/response increases or decreases – Reinforcement/Punishment.
  • Shaping can be used when animals need to learn complex or new behaviours.
  • Reinforcement schedules are rules guiding delivery of rewards/punishments, labelled by:
    • Whether the rule is Fixed or Variable (consistency).
    • Whether rewards depend on the number of responses (Ratio) or the passage of time (Interval).
  • Latent learning refers to learning that can occur without a clear motivator and may only become observable later when reinforced.
  • Real-world relevance and ethical considerations: use of reinforcement, punishment, and shaping in education, behavior modification, animal training, and clinical settings requires consideration of effects on motivation, stress, and avoidance behaviors.

Mathematical notations and schedule references

  • Fixed-Ratio: FR(n)FR(n) (reinforcement after every n responses).
  • Variable-Ratio: VR(n)VR(n) (reinforcement after an average of n responses, but unpredictable).
  • Fixed-Interval: FI(t)FI(t) (reinforcement after the first response following a fixed time interval; e.g., FI(30s)FI(30\,s)).
  • Variable-Interval: VI(t)VI(t) (reinforcement after the first response following an unpredictable time interval; e.g., VI(30s)VI(30\,s) on average).
  • Post-reinforcement pause is a common feature in FR schedules.
  • The overarching idea is that the schedule type influences rate of responding, persistence, and pattern of responding over time.