Instrumental Conditioning: Comprehensive Study Notes (Lecture 2)

Instrumental Conditioning

  • Instrumental conditioning (also called operant conditioning) is learning the associations between a voluntary response and its consequences (outcomes).
  • The learner produces a response (emitted, not elicited) and the response is reinforced or punished, altering future behavior.
  • You operate on the environment to obtain rewards or avoid punishments; your action is instrumental in obtaining the outcome.
  • Key idea: learning is about how actions lead to consequences, shaping future propensity to respond.

Thorndike’s Law of Effect

  • Core idea: actions followed by favourable consequences tend to be repeated; actions followed by punishment tend to be avoided.
  • Everyday illustration:
    • Example of repetition: I study for tests because I tend to get better marks.
    • Example of avoidance: I do not skip tutorials after having previously received a Technical Fail.
  • Implication: the environment contingently strengthens or weakens actions based on their outcomes.

Operant Conditioning: Core Concepts

  • Also called instrumental conditioning.
  • Distinction from classical (Pavlovian) conditioning:
    • Operant conditioning involves a voluntary, emitted response.
    • The response is followed by consequences that reinforce or punish (outcomes).
  • Learning involves the organism operating on the environment to obtain rewards and to avoid punishments.
  • Important terminology:
    • Reinforcement: an outcome that strengthens the tendency to emit the response.
    • Punishment: an outcome that weakens the tendency to emit the response.
    • Positive/Negative are not about good/bad; they refer to whether something is added or removed.

Positive and Negative (Not Valence)

  • Positive means something is added to the environment following a response.
  • Negative means something is removed from the environment following a response.

Reinforcement and Punishment (Definitions)

  • Reinforcement: a reinforcer is an event following a response that strengthens the tendency to make that response.
    • Skinner (1953): “The only defining characteristic of a reinforcer is that it reinforces.”
  • Punishment: an event following a response that weakens the tendency to make that response.

Putting it all together: The Response Matrix

  • Reinforcement tends to increase the instrumental response; Punishment tends to decrease it.
  • When reinforcing:
    • Positive reinforcement: add a reward to increase behaviour.
    • Negative reinforcement: remove a negative consequence to increase behaviour.
  • When punishing:
    • Positive punishment: add a negative consequence to decrease behaviour.
    • Negative punishment: remove a positive consequence to decrease behaviour.

Positive Reinforcement (example)

  • Example: A child brushes their teeth and receives a sticker.
    • Sticker added → POSITIVE.
    • The child is more likely to brush teeth in the future.
    • Outcome: the behaviour of tooth brushing increases → REINFORCEMENT.

Positive Punishment (example)

  • Example: A student asks a question and is heavily criticized by the teacher.
    • Criticism added → POSITIVE.
    • The student is less likely to ask questions in the future.
    • Outcome: asking questions decreases → PUNISHMENT.

Negative Reinforcement (example)

  • Example: You use a spot cream or patch; painful acne goes away.
    • Acne reduced/removed → NEGATIVE.
    • You’re more likely to use spot cream/patch next time you have painful acne.
    • Outcome: use of spot treatment increases → REINFORCEMENT.

Negative Punishment (example)

  • Example: A child swears at his parents; phone privileges are taken away.
    • Phone privileges removed → NEGATIVE.
    • He’s less likely to swear at his parents in the future.
    • Outcome: swearing decreases → PUNISHMENT.

Shaping

  • When an animal does not perform the desired behaviour yet, you reinforce progressively closer approximations to the target behaviour.
  • This technique builds complex behaviours by rewarding successive approximations.

B.F. Skinner Shaping: A Pigeon to Turn Around

  • Classic demonstration of shaping: conditioning a pigeon to perform a new, complex action by reinforcing incremental steps toward the goal.

Problems with Punishment

  • Learners may simply avoid punishment rather than learning the desired behaviour (avoidance).
  • Punishment can inhibit all behaviour, not just the undesired one.
  • It is important to reinforce an alternative (adopt a competing, desirable behaviour).
  • Punishment can create dislike or fear of the punisher (Pavlovian conditioning).
  • Observational learning: learners may copy the punisher.
  • If punishment works, the punisher may be rewarded indirectly (e.g., violence rewarded by reduced undesired behavior).

Reinforcement Schedules: The Rules of Timing and Frequency

  • A rule determining the timing and frequency of reinforcements for a behaviour.
  • Core dimensions:
    • Fixed vs Variable: Fixed is predictable/constant; Variable is unpredictable/average.
    • Ratio vs Interval: Ratio is based on number of responses; Interval is based on the passage of time.
    • Continuous vs Intermittent: Continuous reinforcement/punishment every response; Intermittent reinforcement/punishment only for a subset of responses.

Putting it all together: Schedule Matrix (conceptual)

  • The interaction of Ratio/Interval with Fixed/Variable determines the pattern of responding and persistence under reinforcement/punishment.
  • Example labels you may see:
    • Fixed Ratio (FR)
    • Fixed Interval (FI)
    • Variable Ratio (VR)
    • Variable Interval (VI)
    • Each leads to characteristic response rates and pausing patterns.

Rates of Responding with Different Reinforcement Schedules

  • The contingency between response and outcome greatly affects rate and persistence of responding.
  • Key generalizations:
    • Ratio schedules tend to produce faster responding than interval schedules.
    • Organisms learn that the reinforcer depends on the number of responses (ratio) rather than the passage of time (interval).
    • Fixed schedules tend to produce pauses in responding; variable schedules tend to produce steadier responding.
    • The predictability of fixed schedules allows organisms to anticipate and occasionally pause responding.
    • The unpredictable nature of variable schedules promotes consistent responding.

Fixed-Ratio (FR) – Rates of Responding

  • Reinforcement after a set number of responses.
  • Example: Reinforcement after every 5th response → FR5FR-5.
  • Result: high response rate with a post-reinforcement pause after each reinforcement.

Fixed-Interval (FI) – Rates of Responding

  • The first response after a fixed time interval is reinforced.
  • Example: First response after 30 seconds is reinforced → FI30sFI-30s.
  • Result: scalloped response pattern: pauses, then increasing rate as time to reinforcement approaches.

Variable-Ratio (VR) – Rates of Responding

  • Reinforcement after an average number of responses, varying unpredictably.
  • Example: Reinforcement roughly every 5 responses, but varies → VR5VR-5.
  • Result: very high, steady rate of responding.

Variable-Interval (VI) – Rates of Responding

  • First response after an unpredictable time interval is reinforced.
  • Example: First response after an average of 30 seconds is reinforced, but interval length varies → VI30sVI-30s.
  • Result: moderate, steady response rate.

Skinner Box: Experimental Setup for Operant Conditioning

  • A device used to study operant conditioning in animals (rats, pigeons).
  • Reinforcement examples:
    • Positive Reinforcement: Rat learns to press a lever to receive food.
    • Negative Reinforcement: Rat learns to press a lever to stop receiving electric foot shocks.
    • Positive Punishment: Rat learns not to press a second lever because it receives a foot shock.
    • Negative Punishment: Rat learns to stop pressing a lever to stop receiving foot shocks.

Drive Reduction Theory (Hull)

  • Motivation arises from biological needs that create drives.
  • Behaviour is aimed at reducing these drives, thereby restoring homeostasis.
  • Key ideas:
    • Needs: biological requirements (e.g., food, water, warmth).
    • Drives: internal states of tension or arousal triggered by unmet needs (e.g., hunger, thirst).
    • Drive reduction reinforces behaviour because it reduces the drive.

Motivation and Learning: Connecting Conditioning Types

  • Pavlovian (classical) conditioning: learning about contingencies between biologically relevant stimuli (e.g., food, pain) and neutral stimuli (e.g., a bell).
    • The learner learns to predict good or bad events.
    • Quote from lecture: “I can predict when something good or bad will happen to me.”
  • Instrumental (operant) conditioning: learning the contingency between enacting a behaviour (e.g., saying “please”) and a motivationally relevant outcome (e.g., getting a cookie or not getting one).
    • The learner learns to control outcomes through actions.
  • Question raised: do contingencies always have to exist to learn? Examples: everyday navigation like riding a bus and learning city layouts may involve implicit learning without explicit reward/punishment contingencies.

Tolman (1948): Cognitive Maps and Latent Learning

  • Experimental design: 3 groups of rats conducted maze trials over 10 days.
    • One group received food at the maze end (reinforced).
    • Two groups received no reinforcement throughout training.
  • Findings across training days:
    • Measured by number of errors; more errors = poorer learning.
    • The group reinforced with food showed faster reduction in errors over time.
  • On day 11:
    • The previously non-reinforced group was given a food reward for the first time.
    • This group quickly learned to reach the maze end with few errors once reinforcement was introduced.
    • The consistently unrewarded group continued to erratically navigate.
  • Cognitive maps and latent learning:
    • Tolman proposed that rats formed cognitive maps of the maze even without reinforcement.
    • Latent learning: learning that has occurred but is not yet observable until a motivation to demonstrate it is provided.

Instrumental Conditioning Summary

  • Instrumental (operant) conditioning involves making a voluntary response that leads to an outcome.
  • Classifications are based on two dimensions:
    • Whether something is added or subtracted (Positive/Negative).
    • Whether the behaviour is increased or decreased (Reinforcement/Punishment).
  • Shaping can be used when animals need to learn complex behaviours.
  • Reinforcement schedules are rules for delivering rewards/punishments and are categorized by:
    • Fixed vs Variable (consistency over time or responses).
    • Ratio vs Interval (dependence on number of responses vs time).
  • Latent learning refers to learning that occurs passively, without an obvious motivator at the time of learning.

Connections to Prior Knowledge and Real-World Relevance

  • Instrumental conditioning builds on Thorndike’s Law of Effect and Skinner’s operant theory, complemented by Hull’s drive reduction and Tolman’s cognitive maps.
  • Real-world applications include education (reinforcement strategies), animal training (shaping and schedules), and behavioural modification techniques.
  • Ethical implications of punishment emphasize avoiding avoidance behaviours, fear, or imitation of punitive models; prefer reinforcement-based strategies and shaping to promote desirable behaviours.
  • The interaction between Pavlovian cues and instrumental actions highlights how predictions and control over outcomes influence motivation and learning in everyday life.