Instrumental Conditioning Phenomena Part 1

Instrumental Conditioning: Core Definition & Context

  • Instrumental (a.k.a. operant) conditioning = learning a response–outcome (R–O) contingency.

    • Subject emits a response → environment delivers or removes an outcome.

    • Outcome can be appetitive (pleasant) or aversive (unpleasant).

  • Contrasted with classical (Pavlovian) conditioning:

    • Classical = learning that a CSCS predicts a USUS (stimulus–stimulus relation).

    • Instrumental = response produces consequence (response–outcome relation).

  • Shared element: synaptic plasticity likely underlies both; molecular mechanisms (e.g., long-term potentiation) may be similar even though circuitry differs (e.g., cerebellum for eyeblink, amygdala for fear, broader corticostriatal circuits for instrumental tasks).

Real-World Illustration: "Rat Basketball"

  • Rats shaped to pick up small ball, drop through hoop, then run to feeder.

  • Demonstrates:

    • Shaping (successive approximations).

    • Positive reinforcement (food pellet after successful shot).

    • Complex action chains built from simple responses.

  • Apparatus: hoop + two feeding wells; easy DIY example for cognition labs.

Historical Foundation: Thorndike’s Puzzle Boxes (late 1890s1890s)

  • Cats placed in enclosed box; fish outside = motivation.

  • Required to perform one (simple) or several (compound) responses to escape:

    • Step on treadle, pull string, lift latch, etc.

  • Measured escape latency across trials.

    • Latency decreased with experience → learning curve.

  • Conceptual breakthrough: Law of Effect (responses followed by satisfying outcomes become more probable).

The Four Basic Instrumental Procedures

1. Positive Reinforcement
  • Response produces pleasant stimulus → response frequency increases.

  • Examples: lever press → food; basketball shot → pellet.

2. Omission Training (Negative Punishment / Time-Out)
  • Response removes a normally available pleasant stimulus → response frequency decreases.

  • Example: child hits sibling → loses playground privileges (time-out).

  • Comparative efficacy studies (Allan & Garcia 1969;19731969; 1973):

    • Phase 11: Rats learn lever-press → food.

    • Phase 22 conditions:

    • Extinction: presses no longer reinforced.

    • Omission: lever press prevents food; withholding press for set interval delivers food.

    • Short-term: extinction suppresses faster.

    • Long-term (re-acquisition tests): omission shows greater durability, especially after extended baseline training (up to 2727 days).

3. Punishment (Positive Punishment)
  • Response produces aversive stimulus → response frequency decreases (temporarily).

  • Goodall 19841984 paradigm:

    • Phase 11: lever press → food.

    • Phase 22:

    • Light ON: press → food + foot-shock.

    • Tone ON: yoked shocks delivered independent of pressing.

    • Suppression ratio initially low (strong suppression) but rebounds across 1212 sessions → tolerance / habituation to shock.

    • Side-effects: emergence of alternative strategies (e.g., lying on fur to insulate shocks, yoga-like postures) → behavior persists in modified form.

4. Negative Reinforcement (Escape / Avoidance)
  • Response removes an aversive stimulus → response frequency increases.

  • Two common paradigms:

    • Escape: floor electrified → rat jumps barrier to turn off shock.

    • Active avoidance: warning cue (light) precedes shock; rat learns to respond during cue to prevent shock entirely.

  • Everyday example: headache (aversive) → take aspirin → pain removed → probability of pill-taking rises.

Summary Matrix

Outcome After Response

Pleasant Produced

Pleasant Removed

Aversive Produced

Aversive Removed

Behavioral Effect

↑ (Positive Rft)

↓ (Omission)

↓ (Punishment)

↑ (Negative Rft)

What Is a Reinforcer?

The Circularity Problem
  • Common textbook definition: "stimulus that increases behavior" → circular; cannot predict a priori.

  • Neither mechanistic nor cognitive theories supply clear predictive rule.

Ethological / Premack Solution
  • Premack Principle (1959): Within an individual’s unconstrained behavioral repertoire, high-probability activities can reinforce low-probability activities when made contingent.

  • Baseline free-choice observation (rat, 1010-min session):

    • 50 s\approx 50\text{ s} drinking vs 250 s\approx 250\text{ s} wheel running.

    • Make wheel access contingent on drinking → drinking rises.

  • State-dependency:

    • Water-deprived rats reverse probabilities (now 250 s\approx 250\text{ s} drinking, 50 s\approx 50\text{ s} running) → running can be reinforced by drinking.

  • Human extension (Premack & colleagues):

    • Children preferring pinball > gum: gum chewing rises when required to access pinball.

    • Reverse for gum-preferring children.

  • Quantitative test (Premack 19631963):

    • Measured prior probability of six activities: drinking 16%16\%, 32%32\%, 64%64\% sucrose; running 18 g18\text{ g} wheel; running 80 g80\text{ g} wheel, etc.

    • Found monotonic relation: Responses per sessionP(activity)\text{Responses per session} \propto P(\text{activity}).

    • Plot showed near-linear rise; high-probability events drove more lever-pressing.

Types of Responses & Measurement Paradigms

Discrete-Trial Procedures
  • Experimenter controls start/end of each trial; single response measured.

  • Examples:

    • Straight alley runway: latency to goal-box.

    • Morris water maze: swimming path to hidden platform (negative reinforcement exemplar).

  • Data: escape latency, path length per trial.

Free-Operant Procedures
  • Subject decides when/how often to respond; many responses per session.

  • Classic Skinner box: lever press, key peck.

  • Requires shaping: reinforcing successive approximations until target behavior emitted.

  • Everyday shaping demo: training cats to use toilet (raise litter height, shrink pan, etc.).

Response Topography: Stereotypy vs Variability
  • Basic observation: instrumental training tends toward stereotyped patterns unless variability itself is reinforced.

  • Page & Neuringer 19851985:

    • Pigeons required to emit 88-peck sequences on two keys; repetition of prior sequence not reinforced.

    • Achieved 6769%\approx 67{-}69\% reinforcement → learned to vary.

  • Schwartz 19801980 dynamic grid task:

    • Light on 10×1010 \times 10 matrix moved down (left peck) or right (right peck) to goal.

    • Each pigeon developed idiosyncratic but highly consistent path; between-subject variability high, within-subject low.

  • Conclusion: default is stereotypy; variability requires explicit contingency.

Practical, Educational & Clinical Implications

  • Time-out (omission) often preferable to punishment: longer-term suppression without side-effects.

  • Punishment risks:

    • Tolerance to aversive stimulus.

    • Emergence of covert or alternative undesired behaviors.

  • Premack analysis aids individualized behavior plans: identify high-probability activities for each learner/client.

Connections to Larger Theories & Neuroscience

  • Instrumental learning relies on broader circuitry (e.g., basal ganglia, prefrontal cortex, dopaminergic prediction-error signals).

  • Molecular synaptic changes (e.g., ΔAMPA/NMDA\Delta AMPA/NMDA ratios, CREB activation) likely parallel across operant & Pavlovian paradigms.

  • Conceptual overlap with reinforcement-learning models in AI: positive/negative reinforcement correspond to reward shaping; punishment ≈ negative reward.

Key Numbers & Concepts At-a-Glance

  • Escape latency in Thorndike cats ↓ over successive trials (qualitative trend).

  • Allan & Garcia designs:

    • Baseline training days: 1,3,9,271, 3, 9, 27.

    • Durable suppression superior for omission when baseline training ≥ 99 days.

  • Goodall suppression ratio scale: 00 (complete suppression) to 0.50.5 (no suppression).

  • Premack baseline times (normal rat): 50 s\approx 50\text{ s} drinking vs 250 s\approx 250\text{ s} running in 300 s300\text{ s} session.

  • Page & Neuringer success: ~2/32/3 of 5050 trials reinforced via sequence novelty.

Ethical Considerations

  • Use of aversive stimuli (punishment, shock) raises welfare concerns; modern protocols favor mild intensities and alternative procedures.

  • Shaping and positive reinforcement align with minimally invasive, welfare-positive training (used in zoos, laboratories, education).

Study Strategies

  • Draw flowcharts mapping each procedure’s contingency → behavioral effect.

  • Practice classifying real-world examples (seat-belt alarm, late-fee fines, employee bonuses) into the four categories.

  • For graphs, rehearse reading axes: e.g., suppression ratio, responses/session, baseline days.

  • Anticipate exam questions contrasting omission vs extinction, or predicting reinforcer via Premack logic.