Instrumental Conditioning: Comprehensive Study Notes (Lecture 2)
Instrumental Conditioning
- Instrumental conditioning (also called operant conditioning) is learning the associations between a voluntary response and its consequences (outcomes).
- The learner produces a response (emitted, not elicited) and the response is reinforced or punished, altering future behavior.
- You operate on the environment to obtain rewards or avoid punishments; your action is instrumental in obtaining the outcome.
- Key idea: learning is about how actions lead to consequences, shaping future propensity to respond.
Thorndike’s Law of Effect
- Core idea: actions followed by favourable consequences tend to be repeated; actions followed by punishment tend to be avoided.
- Everyday illustration:
- Example of repetition: I study for tests because I tend to get better marks.
- Example of avoidance: I do not skip tutorials after having previously received a Technical Fail.
- Implication: the environment contingently strengthens or weakens actions based on their outcomes.
Operant Conditioning: Core Concepts
- Also called instrumental conditioning.
- Distinction from classical (Pavlovian) conditioning:
- Operant conditioning involves a voluntary, emitted response.
- The response is followed by consequences that reinforce or punish (outcomes).
- Learning involves the organism operating on the environment to obtain rewards and to avoid punishments.
- Important terminology:
- Reinforcement: an outcome that strengthens the tendency to emit the response.
- Punishment: an outcome that weakens the tendency to emit the response.
- Positive/Negative are not about good/bad; they refer to whether something is added or removed.
Positive and Negative (Not Valence)
- Positive means something is added to the environment following a response.
- Negative means something is removed from the environment following a response.
Reinforcement and Punishment (Definitions)
- Reinforcement: a reinforcer is an event following a response that strengthens the tendency to make that response.
- Skinner (1953): “The only defining characteristic of a reinforcer is that it reinforces.”
- Punishment: an event following a response that weakens the tendency to make that response.
Putting it all together: The Response Matrix
- Reinforcement tends to increase the instrumental response; Punishment tends to decrease it.
- When reinforcing:
- Positive reinforcement: add a reward to increase behaviour.
- Negative reinforcement: remove a negative consequence to increase behaviour.
- When punishing:
- Positive punishment: add a negative consequence to decrease behaviour.
- Negative punishment: remove a positive consequence to decrease behaviour.
Positive Reinforcement (example)
- Example: A child brushes their teeth and receives a sticker.
- Sticker added → POSITIVE.
- The child is more likely to brush teeth in the future.
- Outcome: the behaviour of tooth brushing increases → REINFORCEMENT.
Positive Punishment (example)
- Example: A student asks a question and is heavily criticized by the teacher.
- Criticism added → POSITIVE.
- The student is less likely to ask questions in the future.
- Outcome: asking questions decreases → PUNISHMENT.
Negative Reinforcement (example)
- Example: You use a spot cream or patch; painful acne goes away.
- Acne reduced/removed → NEGATIVE.
- You’re more likely to use spot cream/patch next time you have painful acne.
- Outcome: use of spot treatment increases → REINFORCEMENT.
Negative Punishment (example)
- Example: A child swears at his parents; phone privileges are taken away.
- Phone privileges removed → NEGATIVE.
- He’s less likely to swear at his parents in the future.
- Outcome: swearing decreases → PUNISHMENT.
Shaping
- When an animal does not perform the desired behaviour yet, you reinforce progressively closer approximations to the target behaviour.
- This technique builds complex behaviours by rewarding successive approximations.
B.F. Skinner Shaping: A Pigeon to Turn Around
- Classic demonstration of shaping: conditioning a pigeon to perform a new, complex action by reinforcing incremental steps toward the goal.
Problems with Punishment
- Learners may simply avoid punishment rather than learning the desired behaviour (avoidance).
- Punishment can inhibit all behaviour, not just the undesired one.
- It is important to reinforce an alternative (adopt a competing, desirable behaviour).
- Punishment can create dislike or fear of the punisher (Pavlovian conditioning).
- Observational learning: learners may copy the punisher.
- If punishment works, the punisher may be rewarded indirectly (e.g., violence rewarded by reduced undesired behavior).
Reinforcement Schedules: The Rules of Timing and Frequency
- A rule determining the timing and frequency of reinforcements for a behaviour.
- Core dimensions:
- Fixed vs Variable: Fixed is predictable/constant; Variable is unpredictable/average.
- Ratio vs Interval: Ratio is based on number of responses; Interval is based on the passage of time.
- Continuous vs Intermittent: Continuous reinforcement/punishment every response; Intermittent reinforcement/punishment only for a subset of responses.
Putting it all together: Schedule Matrix (conceptual)
- The interaction of Ratio/Interval with Fixed/Variable determines the pattern of responding and persistence under reinforcement/punishment.
- Example labels you may see:
- Fixed Ratio (FR)
- Fixed Interval (FI)
- Variable Ratio (VR)
- Variable Interval (VI)
- Each leads to characteristic response rates and pausing patterns.
Rates of Responding with Different Reinforcement Schedules
- The contingency between response and outcome greatly affects rate and persistence of responding.
- Key generalizations:
- Ratio schedules tend to produce faster responding than interval schedules.
- Organisms learn that the reinforcer depends on the number of responses (ratio) rather than the passage of time (interval).
- Fixed schedules tend to produce pauses in responding; variable schedules tend to produce steadier responding.
- The predictability of fixed schedules allows organisms to anticipate and occasionally pause responding.
- The unpredictable nature of variable schedules promotes consistent responding.
Fixed-Ratio (FR) – Rates of Responding
- Reinforcement after a set number of responses.
- Example: Reinforcement after every 5th response → FR−5.
- Result: high response rate with a post-reinforcement pause after each reinforcement.
Fixed-Interval (FI) – Rates of Responding
- The first response after a fixed time interval is reinforced.
- Example: First response after 30 seconds is reinforced → FI−30s.
- Result: scalloped response pattern: pauses, then increasing rate as time to reinforcement approaches.
Variable-Ratio (VR) – Rates of Responding
- Reinforcement after an average number of responses, varying unpredictably.
- Example: Reinforcement roughly every 5 responses, but varies → VR−5.
- Result: very high, steady rate of responding.
Variable-Interval (VI) – Rates of Responding
- First response after an unpredictable time interval is reinforced.
- Example: First response after an average of 30 seconds is reinforced, but interval length varies → VI−30s.
- Result: moderate, steady response rate.
Skinner Box: Experimental Setup for Operant Conditioning
- A device used to study operant conditioning in animals (rats, pigeons).
- Reinforcement examples:
- Positive Reinforcement: Rat learns to press a lever to receive food.
- Negative Reinforcement: Rat learns to press a lever to stop receiving electric foot shocks.
- Positive Punishment: Rat learns not to press a second lever because it receives a foot shock.
- Negative Punishment: Rat learns to stop pressing a lever to stop receiving foot shocks.
Drive Reduction Theory (Hull)
- Motivation arises from biological needs that create drives.
- Behaviour is aimed at reducing these drives, thereby restoring homeostasis.
- Key ideas:
- Needs: biological requirements (e.g., food, water, warmth).
- Drives: internal states of tension or arousal triggered by unmet needs (e.g., hunger, thirst).
- Drive reduction reinforces behaviour because it reduces the drive.
Motivation and Learning: Connecting Conditioning Types
- Pavlovian (classical) conditioning: learning about contingencies between biologically relevant stimuli (e.g., food, pain) and neutral stimuli (e.g., a bell).
- The learner learns to predict good or bad events.
- Quote from lecture: “I can predict when something good or bad will happen to me.”
- Instrumental (operant) conditioning: learning the contingency between enacting a behaviour (e.g., saying “please”) and a motivationally relevant outcome (e.g., getting a cookie or not getting one).
- The learner learns to control outcomes through actions.
- Question raised: do contingencies always have to exist to learn? Examples: everyday navigation like riding a bus and learning city layouts may involve implicit learning without explicit reward/punishment contingencies.
Tolman (1948): Cognitive Maps and Latent Learning
- Experimental design: 3 groups of rats conducted maze trials over 10 days.
- One group received food at the maze end (reinforced).
- Two groups received no reinforcement throughout training.
- Findings across training days:
- Measured by number of errors; more errors = poorer learning.
- The group reinforced with food showed faster reduction in errors over time.
- On day 11:
- The previously non-reinforced group was given a food reward for the first time.
- This group quickly learned to reach the maze end with few errors once reinforcement was introduced.
- The consistently unrewarded group continued to erratically navigate.
- Cognitive maps and latent learning:
- Tolman proposed that rats formed cognitive maps of the maze even without reinforcement.
- Latent learning: learning that has occurred but is not yet observable until a motivation to demonstrate it is provided.
Instrumental Conditioning Summary
- Instrumental (operant) conditioning involves making a voluntary response that leads to an outcome.
- Classifications are based on two dimensions:
- Whether something is added or subtracted (Positive/Negative).
- Whether the behaviour is increased or decreased (Reinforcement/Punishment).
- Shaping can be used when animals need to learn complex behaviours.
- Reinforcement schedules are rules for delivering rewards/punishments and are categorized by:
- Fixed vs Variable (consistency over time or responses).
- Ratio vs Interval (dependence on number of responses vs time).
- Latent learning refers to learning that occurs passively, without an obvious motivator at the time of learning.
Connections to Prior Knowledge and Real-World Relevance
- Instrumental conditioning builds on Thorndike’s Law of Effect and Skinner’s operant theory, complemented by Hull’s drive reduction and Tolman’s cognitive maps.
- Real-world applications include education (reinforcement strategies), animal training (shaping and schedules), and behavioural modification techniques.
- Ethical implications of punishment emphasize avoiding avoidance behaviours, fear, or imitation of punitive models; prefer reinforcement-based strategies and shaping to promote desirable behaviours.
- The interaction between Pavlovian cues and instrumental actions highlights how predictions and control over outcomes influence motivation and learning in everyday life.