Instrumental Conditioning Theory Part 2

Instrumental conditioning requires at least two separable links:
- S–R: The stimulus that guides the motor act (e.g., sight of the correct lever → press).
- R–O: Knowledge that the act produces the outcome (e.g., press → food).
Classical–instrumental parallel:
- Classical: CS–US; Instrumental: S–R (habit) & S–O/R–O (goal-directed).

Visual input (occipital cortex) → basal ganglia → motor cortex.
Experiment (Featherstone & McDonald, 2004):
- Basal-ganglia lesions:
- Impaired S–R learning (couldn’t learn which lever or which swim alley led to escape).
- Intact R–O learning (still pressed a lever to gain food once they knew which one).
Implication: BG functions as sensory–motor integrator specifically for S–R mapping.

Lever insertion yields visual, tactile, and auditory cues; all can converge on BG & motor cortex.
Correct response selection demands that sensory information reach motor systems intact.

Delivery of food activates:
- Hypothalamus (reduces hunger/thirst drives).
- Gustatory/olfactory nuclei (taste & smell of banana- or sucrose-flavored pellets).
But: Many reinforcers are not linked to physiological deprivation (socializing, humor, hiking, reading, video games) → need for a broader “reinforcement system.”

Evidence from Premack studies: animals will run on a wheel to gain water or vice-versa → reinforcer value is relative & individual.
Therefore, reinforcement circuitry must be distinct from mere homeostatic circuitry.

Olds (1955) serendipitously discovered that rats prefer locations paired with mild intracranial stimulation.
Core circuit:
- Ventral tegmental area (VTA) dopaminergic neurons.
- Axons → nucleus accumbens (NAc) (ventral striatum).
- Further projections → dorsal striatum & cortex.
Electrical stimulation of this pathway produces extraordinary reinforcement strength:
- Shaping lever press: ≈15 min vs. days with food.
- Up to ≈ $10^3$ presses in 30 min.
- Rats will forego food and lose weight to keep pressing.
Behavioral micro-analysis: rats shift from paw pressing to rapid mouth “flicker” to maximize pulse rate (response topography adapts).

fMRI contrast Funny > Non-Funny:
- Large BOLD peak in NAc at ~5 s post-presentation.
- Confirms cross-species role of NAc in naturalistic positive reinforcement.

Co-activation of:
- Sensory neurons (stimulus).
- Motor neurons (response).
- Dopaminergic “reinforcement” neurons (outcome).
Hebb’s shorthand: “Neurons that fire together wire together.”

Wise (1974) — Dopamine antagonist pimozide mimics extinction:
- Saline + Food → stable ≈250 presses.
- Pimozide + Food ≈ No-Food → rapid decline.
- Early interpretation: dopamine = pleasure (hedonia).
Challenges:
- Parkinson’s patients (low DA) still enjoy rewards.
- 6-OHDA rats still lever-press normally for food.
Salamone et al. (2002) — Incentive Salience test:
- Drug: SKF-83566 (DA antagonist).
- Setup: Press lever for sugary pellets; chow freely available on floor.
- Result: DA block ↓ lever pressing but ↑ chow intake.
- Conclusion: Dopamine mediates “wanting” (stimulus → outcome value), not “liking.”

VTA single-unit firing in monkeys:
1. First juice drop (unexpected): sharp DA burst.
2. After learning CS → press → juice:
- DA burst shifts to CS onset; no burst at juice.
1. Juice omitted after CS: DA dip (negative prediction error).
Supports role of DA in updating S–O predictions rather than direct pleasure.

Many drugs (cocaine, amphetamine) ↔ dopamine, but also cause motor stereotypy → confound.
Solution: Electrical Brain Stimulation Reward (BSR) with adjustable parameters.

Amplitude ([µA]): height of sine-wave → radius of activated axons → # of DA spikes.
Duration / Frequency:
- One pulse ≈ $10$ ms.
- Train length (e.g., $n \times 10\,\text{ms}$ ) = stimulus “intensity.”

Procedure (Edmonds & Gallistel): vary duration each press; map response rate.
Typical baseline:
- <50 ms: minimal responding.
- >50 ms: steep rise → plateau at max presses.
Matches generalized matching law: response allocation proportional to obtained reinforcement.

Low-dose cocaine (DA agonist):
- Threshold shifts left to ≈30 ms.
- Max rate unchanged ⇒ pure reinforcement potentiation.
Intermediate dose:
- Further left shift ≈20 ms.
- Max rate ↑ ⇒ reinforcement + motor activation.
High dose:
- Collapse (stereotypy; animal rocks, no lever presses).
Haloperidol (DA antagonist):
- Threshold ≈baseline (≈45 ms).
- Max rate ↓ ⇒ impaired performance, little change in perceived reinforcement.
Combining agonist+antagonist can dissect hedonic from motor effects → avenue for therapy design (reduce “high” without crippling movement).

Understanding incentive salience informs:
- Treatment of addiction via partial DA modulation.
- Why environmental cues (light, location) can trigger relapse.
Ethical caution: BSR demonstrates how strongly the brain can be “hijacked” — animals self-starve; human parallels in compulsive gambling, gaming, substance abuse.
Translation: cue-exposure therapies, pharmacotherapies targeting VTA→NAc synapses, or DA receptor subtypes to reduce craving.

Echoes of classical conditioning prediction-error models (Rescorla-Wagner, temporal-difference) now neurally instantiated in DA bursts/dips.
Parallels to variability vs. stereotypy: Rats alter response topology (paw→mouth) under high reinforcement rates — mirrors earlier pigeon key-peck shape changes.
Premack Principle revisited: high-probability behaviors (wheel running) drive low-probability ones; mesolimbic DA explains neural currency.

Basal-ganglia lesion → S–R deficit, intact R–O (Featherstone & McDonald 2004).
BSR shaping: ≈15 min to first lever press; up to $10^3$ presses/30 min.
Wise 1974: Saline + Reinf ≈250 presses/session vs. Pimozide/No-Reinf ≈0 by day 4.
Salamone 2002: Progressive SKF doses produced near-zero lever presses, >10 g chow eaten.

BG essential for mapping which stimulus to act upon.
Reinforcement ≠ homeostasis; requires dopaminergic mesolimbic network.
Dopamine signals prediction errors & incentive salience, not pure pleasure.
Electrical BSR provides controllable “dopamine dose” → powerful tool to measure reinforcement without motor confounds.
Rate–frequency curve shift paradigm isolates reinforcer valuation vs. motor capacity; key for evaluating addictive drugs & antidotes.
Therapeutic strategy: target DA receptors to dampen cue-triggered “wanting” while sparing normal movement & enjoyment.