Instrumental Conditioning Phenomena Part 1

Instrumental Conditioning: Core Definition & Context

Instrumental (a.k.a. operant) conditioning = learning a response–outcome (R–O) contingency.
- Subject emits a response → environment delivers or removes an outcome.
- Outcome can be appetitive (pleasant) or aversive (unpleasant).
Contrasted with classical (Pavlovian) conditioning:
- Classical = learning that a $CS$ predicts a $US$ (stimulus–stimulus relation).
- Instrumental = response produces consequence (response–outcome relation).
Shared element: synaptic plasticity likely underlies both; molecular mechanisms (e.g., long-term potentiation) may be similar even though circuitry differs (e.g., cerebellum for eyeblink, amygdala for fear, broader corticostriatal circuits for instrumental tasks).

Real-World Illustration: "Rat Basketball"

Rats shaped to pick up small ball, drop through hoop, then run to feeder.
Demonstrates:
- Shaping (successive approximations).
- Positive reinforcement (food pellet after successful shot).
- Complex action chains built from simple responses.
Apparatus: hoop + two feeding wells; easy DIY example for cognition labs.

Historical Foundation: Thorndike’s Puzzle Boxes (late $1890s$ )

Cats placed in enclosed box; fish outside = motivation.
Required to perform one (simple) or several (compound) responses to escape:
- Step on treadle, pull string, lift latch, etc.
Measured escape latency across trials.
- Latency decreased with experience → learning curve.
Conceptual breakthrough: Law of Effect (responses followed by satisfying outcomes become more probable).

The Four Basic Instrumental Procedures

1. Positive Reinforcement

Response produces pleasant stimulus → response frequency increases.
Examples: lever press → food; basketball shot → pellet.

2. Omission Training (Negative Punishment / Time-Out)

Response removes a normally available pleasant stimulus → response frequency decreases.
Example: child hits sibling → loses playground privileges (time-out).
Comparative efficacy studies (Allan & Garcia $1969; 1973$ ):
- Phase $1$ : Rats learn lever-press → food.
- Phase $2$ conditions:
- Extinction: presses no longer reinforced.
- Omission: lever press prevents food; withholding press for set interval delivers food.
- Short-term: extinction suppresses faster.
- Long-term (re-acquisition tests): omission shows greater durability, especially after extended baseline training (up to $27$ days).

3. Punishment (Positive Punishment)

Response produces aversive stimulus → response frequency decreases (temporarily).
Goodall $1984$ paradigm:
- Phase $1$ : lever press → food.
- Phase $2$ :
- Light ON: press → food + foot-shock.
- Tone ON: yoked shocks delivered independent of pressing.
- Suppression ratio initially low (strong suppression) but rebounds across $12$ sessions → tolerance / habituation to shock.
- Side-effects: emergence of alternative strategies (e.g., lying on fur to insulate shocks, yoga-like postures) → behavior persists in modified form.

4. Negative Reinforcement (Escape / Avoidance)

Response removes an aversive stimulus → response frequency increases.
Two common paradigms:
- Escape: floor electrified → rat jumps barrier to turn off shock.
- Active avoidance: warning cue (light) precedes shock; rat learns to respond during cue to prevent shock entirely.
Everyday example: headache (aversive) → take aspirin → pain removed → probability of pill-taking rises.

Summary Matrix

Outcome After Response	Pleasant Produced	Pleasant Removed	Aversive Produced	Aversive Removed
Behavioral Effect	↑ (Positive Rft)	↓ (Omission)	↓ (Punishment)	↑ (Negative Rft)

What Is a Reinforcer?

The Circularity Problem

Common textbook definition: "stimulus that increases behavior" → circular; cannot predict a priori.
Neither mechanistic nor cognitive theories supply clear predictive rule.

Ethological / Premack Solution

Premack Principle (1959): Within an individual’s unconstrained behavioral repertoire, high-probability activities can reinforce low-probability activities when made contingent.
Baseline free-choice observation (rat, $10$ -min session):
- $\approx 50\text{ s}$ drinking vs $\approx 250\text{ s}$ wheel running.
- Make wheel access contingent on drinking → drinking rises.
State-dependency:
- Water-deprived rats reverse probabilities (now $\approx 250\text{ s}$ drinking, $\approx 50\text{ s}$ running) → running can be reinforced by drinking.
Human extension (Premack & colleagues):
- Children preferring pinball > gum: gum chewing rises when required to access pinball.
- Reverse for gum-preferring children.
Quantitative test (Premack $1963$ ):
- Measured prior probability of six activities: drinking $16\%$ , $32\%$ , $64\%$ sucrose; running $18\text{ g}$ wheel; running $80\text{ g}$ wheel, etc.
- Found monotonic relation: $\text{Responses per session} \propto P(\text{activity})$ .
- Plot showed near-linear rise; high-probability events drove more lever-pressing.

Types of Responses & Measurement Paradigms

Discrete-Trial Procedures

Experimenter controls start/end of each trial; single response measured.
Examples:
- Straight alley runway: latency to goal-box.
- Morris water maze: swimming path to hidden platform (negative reinforcement exemplar).
Data: escape latency, path length per trial.

Free-Operant Procedures

Subject decides when/how often to respond; many responses per session.
Classic Skinner box: lever press, key peck.
Requires shaping: reinforcing successive approximations until target behavior emitted.
Everyday shaping demo: training cats to use toilet (raise litter height, shrink pan, etc.).

Response Topography: Stereotypy vs Variability

Basic observation: instrumental training tends toward stereotyped patterns unless variability itself is reinforced.
Page & Neuringer $1985$ :
- Pigeons required to emit $8$ -peck sequences on two keys; repetition of prior sequence not reinforced.
- Achieved $\approx 67{-}69\%$ reinforcement → learned to vary.
Schwartz $1980$ dynamic grid task:
- Light on $10 \times 10$ matrix moved down (left peck) or right (right peck) to goal.
- Each pigeon developed idiosyncratic but highly consistent path; between-subject variability high, within-subject low.
Conclusion: default is stereotypy; variability requires explicit contingency.

Practical, Educational & Clinical Implications

Time-out (omission) often preferable to punishment: longer-term suppression without side-effects.
Punishment risks:
- Tolerance to aversive stimulus.
- Emergence of covert or alternative undesired behaviors.
Premack analysis aids individualized behavior plans: identify high-probability activities for each learner/client.

Connections to Larger Theories & Neuroscience

Instrumental learning relies on broader circuitry (e.g., basal ganglia, prefrontal cortex, dopaminergic prediction-error signals).
Molecular synaptic changes (e.g., $\Delta AMPA/NMDA$ ratios, CREB activation) likely parallel across operant & Pavlovian paradigms.
Conceptual overlap with reinforcement-learning models in AI: positive/negative reinforcement correspond to reward shaping; punishment ≈ negative reward.

Key Numbers & Concepts At-a-Glance

Escape latency in Thorndike cats ↓ over successive trials (qualitative trend).
Allan & Garcia designs:
- Baseline training days: $1, 3, 9, 27$ .
- Durable suppression superior for omission when baseline training ≥ $9$ days.
Goodall suppression ratio scale: $0$ (complete suppression) to $0.5$ (no suppression).
Premack baseline times (normal rat): $\approx 50\text{ s}$ drinking vs $\approx 250\text{ s}$ running in $300\text{ s}$ session.
Page & Neuringer success: ~ $2/3$ of $50$ trials reinforced via sequence novelty.

Ethical Considerations

Use of aversive stimuli (punishment, shock) raises welfare concerns; modern protocols favor mild intensities and alternative procedures.
Shaping and positive reinforcement align with minimally invasive, welfare-positive training (used in zoos, laboratories, education).

Study Strategies

Draw flowcharts mapping each procedure’s contingency → behavioral effect.
Practice classifying real-world examples (seat-belt alarm, late-fee fines, employee bonuses) into the four categories.
For graphs, rehearse reading axes: e.g., suppression ratio, responses/session, baseline days.
Anticipate exam questions contrasting omission vs extinction, or predicting reinforcer via Premack logic.

Instrumental Conditioning Phenomena Part 1

Instrumental Conditioning: Core Definition & Context

Real-World Illustration: "Rat Basketball"

Historical Foundation: Thorndike’s Puzzle Boxes (late 1890s1890s1890s)

The Four Basic Instrumental Procedures

1. Positive Reinforcement

2. Omission Training (Negative Punishment / Time-Out)

3. Punishment (Positive Punishment)

4. Negative Reinforcement (Escape / Avoidance)

Summary Matrix

What Is a Reinforcer?

The Circularity Problem

Ethological / Premack Solution

Types of Responses & Measurement Paradigms

Discrete-Trial Procedures

Free-Operant Procedures

Response Topography: Stereotypy vs Variability

Practical, Educational & Clinical Implications

Connections to Larger Theories & Neuroscience

Key Numbers & Concepts At-a-Glance

Ethical Considerations

Study Strategies

Historical Foundation: Thorndike’s Puzzle Boxes (late $1890s$ )