Instrumental Conditioning Phenomena Part 1
Instrumental Conditioning: Core Definition & Context
Instrumental (a.k.a. operant) conditioning = learning a response–outcome (R–O) contingency.
Subject emits a response → environment delivers or removes an outcome.
Outcome can be appetitive (pleasant) or aversive (unpleasant).
Contrasted with classical (Pavlovian) conditioning:
Classical = learning that a predicts a (stimulus–stimulus relation).
Instrumental = response produces consequence (response–outcome relation).
Shared element: synaptic plasticity likely underlies both; molecular mechanisms (e.g., long-term potentiation) may be similar even though circuitry differs (e.g., cerebellum for eyeblink, amygdala for fear, broader corticostriatal circuits for instrumental tasks).
Real-World Illustration: "Rat Basketball"
Rats shaped to pick up small ball, drop through hoop, then run to feeder.
Demonstrates:
Shaping (successive approximations).
Positive reinforcement (food pellet after successful shot).
Complex action chains built from simple responses.
Apparatus: hoop + two feeding wells; easy DIY example for cognition labs.
Historical Foundation: Thorndike’s Puzzle Boxes (late )
Cats placed in enclosed box; fish outside = motivation.
Required to perform one (simple) or several (compound) responses to escape:
Step on treadle, pull string, lift latch, etc.
Measured escape latency across trials.
Latency decreased with experience → learning curve.
Conceptual breakthrough: Law of Effect (responses followed by satisfying outcomes become more probable).
The Four Basic Instrumental Procedures
1. Positive Reinforcement
Response produces pleasant stimulus → response frequency increases.
Examples: lever press → food; basketball shot → pellet.
2. Omission Training (Negative Punishment / Time-Out)
Response removes a normally available pleasant stimulus → response frequency decreases.
Example: child hits sibling → loses playground privileges (time-out).
Comparative efficacy studies (Allan & Garcia ):
Phase : Rats learn lever-press → food.
Phase conditions:
Extinction: presses no longer reinforced.
Omission: lever press prevents food; withholding press for set interval delivers food.
Short-term: extinction suppresses faster.
Long-term (re-acquisition tests): omission shows greater durability, especially after extended baseline training (up to days).
3. Punishment (Positive Punishment)
Response produces aversive stimulus → response frequency decreases (temporarily).
Goodall paradigm:
Phase : lever press → food.
Phase :
Light ON: press → food + foot-shock.
Tone ON: yoked shocks delivered independent of pressing.
Suppression ratio initially low (strong suppression) but rebounds across sessions → tolerance / habituation to shock.
Side-effects: emergence of alternative strategies (e.g., lying on fur to insulate shocks, yoga-like postures) → behavior persists in modified form.
4. Negative Reinforcement (Escape / Avoidance)
Response removes an aversive stimulus → response frequency increases.
Two common paradigms:
Escape: floor electrified → rat jumps barrier to turn off shock.
Active avoidance: warning cue (light) precedes shock; rat learns to respond during cue to prevent shock entirely.
Everyday example: headache (aversive) → take aspirin → pain removed → probability of pill-taking rises.
Summary Matrix
Outcome After Response | Pleasant Produced | Pleasant Removed | Aversive Produced | Aversive Removed |
|---|---|---|---|---|
Behavioral Effect | ↑ (Positive Rft) | ↓ (Omission) | ↓ (Punishment) | ↑ (Negative Rft) |
What Is a Reinforcer?
The Circularity Problem
Common textbook definition: "stimulus that increases behavior" → circular; cannot predict a priori.
Neither mechanistic nor cognitive theories supply clear predictive rule.
Ethological / Premack Solution
Premack Principle (1959): Within an individual’s unconstrained behavioral repertoire, high-probability activities can reinforce low-probability activities when made contingent.
Baseline free-choice observation (rat, -min session):
drinking vs wheel running.
Make wheel access contingent on drinking → drinking rises.
State-dependency:
Water-deprived rats reverse probabilities (now drinking, running) → running can be reinforced by drinking.
Human extension (Premack & colleagues):
Children preferring pinball > gum: gum chewing rises when required to access pinball.
Reverse for gum-preferring children.
Quantitative test (Premack ):
Measured prior probability of six activities: drinking , , sucrose; running wheel; running wheel, etc.
Found monotonic relation: .
Plot showed near-linear rise; high-probability events drove more lever-pressing.
Types of Responses & Measurement Paradigms
Discrete-Trial Procedures
Experimenter controls start/end of each trial; single response measured.
Examples:
Straight alley runway: latency to goal-box.
Morris water maze: swimming path to hidden platform (negative reinforcement exemplar).
Data: escape latency, path length per trial.
Free-Operant Procedures
Subject decides when/how often to respond; many responses per session.
Classic Skinner box: lever press, key peck.
Requires shaping: reinforcing successive approximations until target behavior emitted.
Everyday shaping demo: training cats to use toilet (raise litter height, shrink pan, etc.).
Response Topography: Stereotypy vs Variability
Basic observation: instrumental training tends toward stereotyped patterns unless variability itself is reinforced.
Page & Neuringer :
Pigeons required to emit -peck sequences on two keys; repetition of prior sequence not reinforced.
Achieved reinforcement → learned to vary.
Schwartz dynamic grid task:
Light on matrix moved down (left peck) or right (right peck) to goal.
Each pigeon developed idiosyncratic but highly consistent path; between-subject variability high, within-subject low.
Conclusion: default is stereotypy; variability requires explicit contingency.
Practical, Educational & Clinical Implications
Time-out (omission) often preferable to punishment: longer-term suppression without side-effects.
Punishment risks:
Tolerance to aversive stimulus.
Emergence of covert or alternative undesired behaviors.
Premack analysis aids individualized behavior plans: identify high-probability activities for each learner/client.
Connections to Larger Theories & Neuroscience
Instrumental learning relies on broader circuitry (e.g., basal ganglia, prefrontal cortex, dopaminergic prediction-error signals).
Molecular synaptic changes (e.g., ratios, CREB activation) likely parallel across operant & Pavlovian paradigms.
Conceptual overlap with reinforcement-learning models in AI: positive/negative reinforcement correspond to reward shaping; punishment ≈ negative reward.
Key Numbers & Concepts At-a-Glance
Escape latency in Thorndike cats ↓ over successive trials (qualitative trend).
Allan & Garcia designs:
Baseline training days: .
Durable suppression superior for omission when baseline training ≥ days.
Goodall suppression ratio scale: (complete suppression) to (no suppression).
Premack baseline times (normal rat): drinking vs running in session.
Page & Neuringer success: ~ of trials reinforced via sequence novelty.
Ethical Considerations
Use of aversive stimuli (punishment, shock) raises welfare concerns; modern protocols favor mild intensities and alternative procedures.
Shaping and positive reinforcement align with minimally invasive, welfare-positive training (used in zoos, laboratories, education).
Study Strategies
Draw flowcharts mapping each procedure’s contingency → behavioral effect.
Practice classifying real-world examples (seat-belt alarm, late-fee fines, employee bonuses) into the four categories.
For graphs, rehearse reading axes: e.g., suppression ratio, responses/session, baseline days.
Anticipate exam questions contrasting omission vs extinction, or predicting reinforcer via Premack logic.