Instrumental Conditioning Theory Part 2

Stimulus–Response (S–R) vs. Response–Outcome (R–O) Associations

  • Instrumental conditioning requires at least two separable links:

    • S–R: The stimulus that guides the motor act (e.g., sight of the correct lever → press).

    • R–O: Knowledge that the act produces the outcome (e.g., press → food).

  • Classical–instrumental parallel:

    • Classical: CS–US; Instrumental: S–R (habit) & S–O/R–O (goal-directed).

Basal Ganglia and S–R Learning

  • Visual input (occipital cortex) → basal ganglia → motor cortex.

  • Experiment (Featherstone & McDonald, 2004):

    • Basal-ganglia lesions:

    • Impaired S–R learning (couldn’t learn which lever or which swim alley led to escape).

    • Intact R–O learning (still pressed a lever to gain food once they knew which one).

  • Implication: BG functions as sensory–motor integrator specifically for S–R mapping.

Multisensory Guidance of the Response

  • Lever insertion yields visual, tactile, and auditory cues; all can converge on BG & motor cortex.

  • Correct response selection demands that sensory information reach motor systems intact.

Outcome Processing: Homeostasis & Gustation

  • Delivery of food activates:

    • Hypothalamus (reduces hunger/thirst drives).

    • Gustatory/olfactory nuclei (taste & smell of banana- or sucrose-flavored pellets).

  • But: Many reinforcers are not linked to physiological deprivation (socializing, humor, hiking, reading, video games) → need for a broader “reinforcement system.”

Existence of a Separate Reinforcement System

  • Evidence from Premack studies: animals will run on a wheel to gain water or vice-versa → reinforcer value is relative & individual.

  • Therefore, reinforcement circuitry must be distinct from mere homeostatic circuitry.

The Mesolimbic Dopamine System (VTA → NAc → Dorsal Striatum)

  • Olds (1955) serendipitously discovered that rats prefer locations paired with mild intracranial stimulation.

  • Core circuit:

    • Ventral tegmental area (VTA) dopaminergic neurons.

    • Axons → nucleus accumbens (NAc) (ventral striatum).

    • Further projections → dorsal striatum & cortex.

  • Electrical stimulation of this pathway produces extraordinary reinforcement strength:

    • Shaping lever press: ≈15 min vs. days with food.

    • Up to ≈10310^3 presses in 30 min.

    • Rats will forego food and lose weight to keep pressing.

  • Behavioral micro-analysis: rats shift from paw pressing to rapid mouth “flicker” to maximize pulse rate (response topography adapts).

Human Convergence: Humor & NAc Activation (Mobbs et al., 2003)

  • fMRI contrast Funny > Non-Funny:

    • Large BOLD peak in NAc at ~5 s post-presentation.

    • Confirms cross-species role of NAc in naturalistic positive reinforcement.

Hebbian Mechanism for Reinforcement Learning

  • Co-activation of:

    • Sensory neurons (stimulus).

    • Motor neurons (response).

    • Dopaminergic “reinforcement” neurons (outcome).

  • Hebb’s shorthand: “Neurons that fire together wire together.”

Dopamine: Hedonia vs. Incentive Salience Debate

  • Wise (1974) — Dopamine antagonist pimozide mimics extinction:

    • Saline + Food → stable ≈250 presses.

    • Pimozide + Food ≈ No-Food → rapid decline.

    • Early interpretation: dopamine = pleasure (hedonia).

  • Challenges:

    • Parkinson’s patients (low DA) still enjoy rewards.

    • 6-OHDA rats still lever-press normally for food.

  • Salamone et al. (2002) — Incentive Salience test:

    • Drug: SKF-83566 (DA antagonist).

    • Setup: Press lever for sugary pellets; chow freely available on floor.

    • Result: DA block ↓ lever pressing but ↑ chow intake.

    • Conclusion: Dopamine mediates “wanting” (stimulus → outcome value), not “liking.”

Dopamine Prediction-Error Coding (Schultz, 1997)

  • VTA single-unit firing in monkeys:

    1. First juice drop (unexpected): sharp DA burst.

    2. After learning CS → press → juice:

    • DA burst shifts to CS onset; no burst at juice.

    1. Juice omitted after CS: DA dip (negative prediction error).

  • Supports role of DA in updating S–O predictions rather than direct pleasure.

Application: Drug Addiction & Need to Isolate “Reinforcement” from “Performance”

  • Many drugs (cocaine, amphetamine) ↔ dopamine, but also cause motor stereotypy → confound.

  • Solution: Electrical Brain Stimulation Reward (BSR) with adjustable parameters.

Electrical Stimulation Parameters & Neural “Dose”

  • Amplitude ([µA]): height of sine-wave → radius of activated axons → # of DA spikes.

  • Duration / Frequency:

    • One pulse ≈ 1010 ms.

    • Train length (e.g., n×10msn \times 10\,\text{ms}) = stimulus “intensity.”

Rate–Frequency Curve & Matching Law

  • Procedure (Edmonds & Gallistel): vary duration each press; map response rate.

  • Typical baseline:

    • <50 ms: minimal responding.

    • >50 ms: steep rise → plateau at max presses.

  • Matches generalized matching law: response allocation proportional to obtained reinforcement.

Pharmacological Shifts of the Curve

  • Low-dose cocaine (DA agonist):

    • Threshold shifts left to ≈30 ms.

    • Max rate unchanged ⇒ pure reinforcement potentiation.

  • Intermediate dose:

    • Further left shift ≈20 ms.

    • Max rate ↑ ⇒ reinforcement + motor activation.

  • High dose:

    • Collapse (stereotypy; animal rocks, no lever presses).

  • Haloperidol (DA antagonist):

    • Threshold ≈baseline (≈45 ms).

    • Max rate ↓ ⇒ impaired performance, little change in perceived reinforcement.

  • Combining agonist+antagonist can dissect hedonic from motor effects → avenue for therapy design (reduce “high” without crippling movement).

Broader Implications & Ethical/Clinical Notes

  • Understanding incentive salience informs:

    • Treatment of addiction via partial DA modulation.

    • Why environmental cues (light, location) can trigger relapse.

  • Ethical caution: BSR demonstrates how strongly the brain can be “hijacked” — animals self-starve; human parallels in compulsive gambling, gaming, substance abuse.

  • Translation: cue-exposure therapies, pharmacotherapies targeting VTA→NAc synapses, or DA receptor subtypes to reduce craving.

Connections to Previous Lectures

  • Echoes of classical conditioning prediction-error models (Rescorla-Wagner, temporal-difference) now neurally instantiated in DA bursts/dips.

  • Parallels to variability vs. stereotypy: Rats alter response topology (paw→mouth) under high reinforcement rates — mirrors earlier pigeon key-peck shape changes.

  • Premack Principle revisited: high-probability behaviors (wheel running) drive low-probability ones; mesolimbic DA explains neural currency.

Key Numerical / Experimental Benchmarks

  • Basal-ganglia lesion → S–R deficit, intact R–O (Featherstone & McDonald 2004).

  • BSR shaping: ≈15 min to first lever press; up to 10310^3 presses/30 min.

  • Wise 1974: Saline + Reinf ≈250 presses/session vs. Pimozide/No-Reinf ≈0 by day 4.

  • Salamone 2002: Progressive SKF doses produced near-zero lever presses, >10 g chow eaten.

  • BG essential for mapping which stimulus to act upon.

  • Reinforcement ≠ homeostasis; requires dopaminergic mesolimbic network.

  • Dopamine signals prediction errors & incentive salience, not pure pleasure.

  • Electrical BSR provides controllable “dopamine dose” → powerful tool to measure reinforcement without motor confounds.

  • Rate–frequency curve shift paradigm isolates reinforcer valuation vs. motor capacity; key for evaluating addictive drugs & antidotes.

  • Therapeutic strategy: target DA receptors to dampen cue-triggered “wanting” while sparing normal movement & enjoyment.