Instrumental Conditioning Theory Part 2
Stimulus–Response (S–R) vs. Response–Outcome (R–O) Associations
Instrumental conditioning requires at least two separable links:
S–R: The stimulus that guides the motor act (e.g., sight of the correct lever → press).
R–O: Knowledge that the act produces the outcome (e.g., press → food).
Classical–instrumental parallel:
Classical: CS–US; Instrumental: S–R (habit) & S–O/R–O (goal-directed).
Basal Ganglia and S–R Learning
Visual input (occipital cortex) → basal ganglia → motor cortex.
Experiment (Featherstone & McDonald, 2004):
Basal-ganglia lesions:
Impaired S–R learning (couldn’t learn which lever or which swim alley led to escape).
Intact R–O learning (still pressed a lever to gain food once they knew which one).
Implication: BG functions as sensory–motor integrator specifically for S–R mapping.
Multisensory Guidance of the Response
Lever insertion yields visual, tactile, and auditory cues; all can converge on BG & motor cortex.
Correct response selection demands that sensory information reach motor systems intact.
Outcome Processing: Homeostasis & Gustation
Delivery of food activates:
Hypothalamus (reduces hunger/thirst drives).
Gustatory/olfactory nuclei (taste & smell of banana- or sucrose-flavored pellets).
But: Many reinforcers are not linked to physiological deprivation (socializing, humor, hiking, reading, video games) → need for a broader “reinforcement system.”
Existence of a Separate Reinforcement System
Evidence from Premack studies: animals will run on a wheel to gain water or vice-versa → reinforcer value is relative & individual.
Therefore, reinforcement circuitry must be distinct from mere homeostatic circuitry.
The Mesolimbic Dopamine System (VTA → NAc → Dorsal Striatum)
Olds (1955) serendipitously discovered that rats prefer locations paired with mild intracranial stimulation.
Core circuit:
Ventral tegmental area (VTA) dopaminergic neurons.
Axons → nucleus accumbens (NAc) (ventral striatum).
Further projections → dorsal striatum & cortex.
Electrical stimulation of this pathway produces extraordinary reinforcement strength:
Shaping lever press: ≈15 min vs. days with food.
Up to ≈ presses in 30 min.
Rats will forego food and lose weight to keep pressing.
Behavioral micro-analysis: rats shift from paw pressing to rapid mouth “flicker” to maximize pulse rate (response topography adapts).
Human Convergence: Humor & NAc Activation (Mobbs et al., 2003)
fMRI contrast Funny > Non-Funny:
Large BOLD peak in NAc at ~5 s post-presentation.
Confirms cross-species role of NAc in naturalistic positive reinforcement.
Hebbian Mechanism for Reinforcement Learning
Co-activation of:
Sensory neurons (stimulus).
Motor neurons (response).
Dopaminergic “reinforcement” neurons (outcome).
Hebb’s shorthand: “Neurons that fire together wire together.”
Dopamine: Hedonia vs. Incentive Salience Debate
Wise (1974) — Dopamine antagonist pimozide mimics extinction:
Saline + Food → stable ≈250 presses.
Pimozide + Food ≈ No-Food → rapid decline.
Early interpretation: dopamine = pleasure (hedonia).
Challenges:
Parkinson’s patients (low DA) still enjoy rewards.
6-OHDA rats still lever-press normally for food.
Salamone et al. (2002) — Incentive Salience test:
Drug: SKF-83566 (DA antagonist).
Setup: Press lever for sugary pellets; chow freely available on floor.
Result: DA block ↓ lever pressing but ↑ chow intake.
Conclusion: Dopamine mediates “wanting” (stimulus → outcome value), not “liking.”
Dopamine Prediction-Error Coding (Schultz, 1997)
VTA single-unit firing in monkeys:
First juice drop (unexpected): sharp DA burst.
After learning CS → press → juice:
DA burst shifts to CS onset; no burst at juice.
Juice omitted after CS: DA dip (negative prediction error).
Supports role of DA in updating S–O predictions rather than direct pleasure.
Application: Drug Addiction & Need to Isolate “Reinforcement” from “Performance”
Many drugs (cocaine, amphetamine) ↔ dopamine, but also cause motor stereotypy → confound.
Solution: Electrical Brain Stimulation Reward (BSR) with adjustable parameters.
Electrical Stimulation Parameters & Neural “Dose”
Amplitude ([µA]): height of sine-wave → radius of activated axons → # of DA spikes.
Duration / Frequency:
One pulse ≈ ms.
Train length (e.g., ) = stimulus “intensity.”
Rate–Frequency Curve & Matching Law
Procedure (Edmonds & Gallistel): vary duration each press; map response rate.
Typical baseline:
<50 ms: minimal responding.
>50 ms: steep rise → plateau at max presses.
Matches generalized matching law: response allocation proportional to obtained reinforcement.
Pharmacological Shifts of the Curve
Low-dose cocaine (DA agonist):
Threshold shifts left to ≈30 ms.
Max rate unchanged ⇒ pure reinforcement potentiation.
Intermediate dose:
Further left shift ≈20 ms.
Max rate ↑ ⇒ reinforcement + motor activation.
High dose:
Collapse (stereotypy; animal rocks, no lever presses).
Haloperidol (DA antagonist):
Threshold ≈baseline (≈45 ms).
Max rate ↓ ⇒ impaired performance, little change in perceived reinforcement.
Combining agonist+antagonist can dissect hedonic from motor effects → avenue for therapy design (reduce “high” without crippling movement).
Broader Implications & Ethical/Clinical Notes
Understanding incentive salience informs:
Treatment of addiction via partial DA modulation.
Why environmental cues (light, location) can trigger relapse.
Ethical caution: BSR demonstrates how strongly the brain can be “hijacked” — animals self-starve; human parallels in compulsive gambling, gaming, substance abuse.
Translation: cue-exposure therapies, pharmacotherapies targeting VTA→NAc synapses, or DA receptor subtypes to reduce craving.
Connections to Previous Lectures
Echoes of classical conditioning prediction-error models (Rescorla-Wagner, temporal-difference) now neurally instantiated in DA bursts/dips.
Parallels to variability vs. stereotypy: Rats alter response topology (paw→mouth) under high reinforcement rates — mirrors earlier pigeon key-peck shape changes.
Premack Principle revisited: high-probability behaviors (wheel running) drive low-probability ones; mesolimbic DA explains neural currency.
Key Numerical / Experimental Benchmarks
Basal-ganglia lesion → S–R deficit, intact R–O (Featherstone & McDonald 2004).
BSR shaping: ≈15 min to first lever press; up to presses/30 min.
Wise 1974: Saline + Reinf ≈250 presses/session vs. Pimozide/No-Reinf ≈0 by day 4.
Salamone 2002: Progressive SKF doses produced near-zero lever presses, >10 g chow eaten.
BG essential for mapping which stimulus to act upon.
Reinforcement ≠ homeostasis; requires dopaminergic mesolimbic network.
Dopamine signals prediction errors & incentive salience, not pure pleasure.
Electrical BSR provides controllable “dopamine dose” → powerful tool to measure reinforcement without motor confounds.
Rate–frequency curve shift paradigm isolates reinforcer valuation vs. motor capacity; key for evaluating addictive drugs & antidotes.
Therapeutic strategy: target DA receptors to dampen cue-triggered “wanting” while sparing normal movement & enjoyment.