The Role of Dopamine in Reward

The Causal Question

Dopamine in Reward

The main inquiry of the reading is to answer the question:

“What does DA do in reward?”

There are 3 competing areas of explanation for DA’s role in reward:

“Liking”
Learning
“Wanting”

Approaches to the Causal Question

So which explanation is the correct answer? Well, the answer has been approached in several experimental ways. It comes down to how we assign causality to a brain event:

What specific reward function is lost when dopamine neurotransmission is suppressed?

E.g., antagonists, neurotoxin, lesions
Asks about a necessary cause for reward

What reward function is enhanced when dopamine neurotransmission is increased?

E.g., antagonists, brain stimulation, hyper-DA genetic mutation
Asks about a sufficient cause for reward

What reward functions are coded by dopamine neural activation during reward events?

Asks about neural coding of function via correlation

Possible Answers to the Causal Question

Unsupported Hypotheses

Activation-Sensorimotor Hypothesis

The activation-sensorimotor hypothesis posits that DA mediates general functions of:

Action generation
Effort
Movement
General arousal or behavioural activation

This is supported by substantial evidence: however, this is very general in scope, making it difficult to explain specific aspects of reward.

Hedonia Hypothesis

Evidence in Favour of the Hedonia Hypothesis

The hedonia hypothesis suggests that DA in NAc is a “pleasure neurotransmitter”.

It mediates the positive reinforcing effects of reward stimuli
- In a hedonic reward sense of the term “reinforcement”
The suppression of DA causes anhedonia

Evidence Against the Hedonia Hypothesis

However, DA ≠ hedonic reactions (in either rats or humans).

DA reduction does not decrease “liking”
- 6-OHDA lesions have no effect
  - Though they destroy up to 99% of DA in NAc and neostriatum
- Neuroleptic drugs (like pimozide) do not shift reactions towards “disliking”
- DA neurons in monkeys stop firing to rewards after prediction is learned
  - Whatever hedonic impact of reward is mediated without a DA signal
…

Reward Learning Hypothesis

DA signal modulates synaptic plasticity in target neurons

Or adjusts synaptic efficacy in appropriate neuronal circuits of input layers of learning networks
- Particulalry neostriatum and Nac

Psychologically it suggests that DA acts to “stamp in” links between S-S or S-R events

Acts as a teaching signal for new learning or a computational prediction generator

Schultz’s Electrophys. Studies

monkeys sat down for insturmental condiitoning task, VTA measured (originating source of dopamine production, espcially mesocorticolimbic DA pathway), had to move joystick for “correct stimulus”.

They started by guessing, not knowing the rules of the game per se, but when they got it right they got juice. After juice was recevied, DA neurons activated. As trials continued, monkeys learned rules of the game and had accurate responses. So initially, DA was activated after juice. Then, DA was activated during the salient reward-predicting cues.

And extinction trials (in which reward wasn’t given), VTA activity reduced below baseline (related to prediction error models — CS is no longer preditctive of reward).

First Trial:

Outstanding Questions

The Three Dopamine Learning Hypotheses

A set of different but closely related hypotheses

All posit DA mediates learning but in different ways

DA signals “stamp in” S-R or S-S associations whenever a reward follows
1. Simplest

2. DA activation causes new habit learning and

enhances habit performance

3. – Most sophisticated

DA systems mediate computational teaching

signals via US prediction errors

Associative Stamping-In?

Thorndike’s “Law of Effect”

If a response in the presence of a stimulus is followed by a satisfying event, association between the stimulus (S) and the response (R) is strengthened
If a response is followed by an undesirable event, the S-R association is weakened
Reinforcer (O) serves to ‘stamp-in’ the S-R association

Notes:

Motivation for instrumental behaviour:
- Activation of the S-R association upon exposure to contextual stimuli (S), in the presence of which the response was previously reinforced
The resulting event is not part of the association
The satisfying or annoying consequence serves to strengthen or weaken the S-R association
No learning about ‘O’ or ‘S-O’ or ‘R-O’

Supporting Evidence

Dopamine appears to strengthen learning procedures in different ways.

S-R Learning

Notes above.

Habit Learning

Implies more proactive learning dopamine involvement. Habits are formed (rpboabilistic dispaly of certain behaviours over others).

Prediction Error Learning Models

try to predict what’s gonna happen next — DA mediates prediction value carried by a CS previously associated with reward (or whenever it is suprising).

Evaluating Direct Roles of DA

Is DA A Necessary Cause?

Incentive Salience Hypothesis

learnign translate into behavioural output — dopamine causes this motivational response. If the monkeys is Shultz study didn’t like the juice, they would still be able to predict when the juice would be available, but you woulnd’t see the dopamine spike (e.g, the wanting/motivation to get the juice)

DA can make a CS more sought out. If you’re hungry and want food, you’ll pay more attention to your environment for cues that might predict food (i.e., more likely to notice food-related cues).

initial learning, reboosting (happens with each encounter of CS to make us want to the reward a little more with each CS exposure), then wanting — liking and wanting will often align in terms of stages of incentive salience (temporally speaking)

Stages of the Incentive Salience Hypothesis

Evidence in Favour of the Incentive Salience Hypothesis

The Causal Question

Core Inquiry: What does dopamine (DA) do in the context of reward?
Contains 3 competing explanations:
1. “Liking”
2. Learning
3. “Wanting”
Approaches to answering this question include several experimental methods.

Experimental Approaches

Evaluating specific reward functions lost when DA neurotransmission is suppressed:
- Techniques: antagonists, neurotoxins, lesions.
- Focus: Necessary causes for reward.
Evaluating reward functions enhanced by increased DA signaling:
- Techniques: agonists, brain stimulation, genetically induced hyper-DA mutations.
- Focus: Sufficient causes for reward.
Investigating reward functions coded by DA neural activations during reward events:
- Focus: Neural coding of function via correlation.

Emphasizes that DA function is multifaceted, and combining these approaches is beneficial.

Possible Answers to DA's Role in Reward

Activation-sensorimotor hypotheses (effort, arousal, and response vigor).
Hedonia hypothesis (pleasure linked to rewards).
Reward learning hypotheses (associative stamping-in, teaching signals, prediction errors).
Incentive salience hypothesis (the “wanting” aspect of rewards).

Activation-Sensorimotor Hypothesis

DA mediates several general functions:
- Action generation
- Effort
- Movement
- General arousal or behavioral activation.
This hypothesis is well-supported by substantial evidence but is too broad to explain specific reward mechanisms.

Hedonia Hypothesis

DA in the nucleus accumbens (NAc) is suggested to function as a “pleasure neurotransmitter” that mediates the positive reinforcing effects of reward stimuli.
Interprets reinforcement hedonic sense as follows:
- Suppression of DA leads to anhedonia (the absence of pleasure).

Evidence Against the Hedonia Hypothesis

DA Reduction Evidence:
- DA reduction does not decrease “liking” in rats;
- Example: 6-OHDA lesions may destroy up to 99% of DA yet have no effect.
- Neuroleptic drugs (like pimozide) do not shift reactions towards “disliking.”
- DA neurons stop firing to rewards after prediction has been learned in monkeys.
Conclusion: The hedonic impact of reward seems to be mediated without DA signaling.

DA Impact on Hedonic Reactions in Rats

DA activation does not enhance “liking.” Evaluations show:
- Hyper-DA mutation (like DAT-KO mice) does not increase “liking.”
- Amphetamine microinjection into NAc does not increase hedonic potency.
- Sensitization and electrical brain stimulation did not enhance hedonic impact of reward.

DA Impact on Hedonic Reactions in Humans

Patients with Parkinson's Disease (PD):
- They demonstrate normal ratings of liking.
However, individuals with DA dysregulation syndrome (DDS) show increased “wanting” ([DDS] characterized by compulsive activities and increased L-DOPA intake).
The advantage of studying individuals with DDS is that it avoids confounds typically seen in drug addicts;
- L-DOPA does not induce euphoric effects or dysphoric withdrawal.

Summary of Evidence Against Hedonia Hypothesis

DA does not produce normal “liking” reactions in rats or humans.
Activation increases in DA have not been shown to amplify hedonic impact when “wanting” is separated from “liking.”
Main contributions of DA must mediate nonhedonic aspects of reward; moves to consideration of nonhedonic hypotheses (Reward learning, Incentive salience).

Reward Learning Hypothesis

Proposes that DA signals modulate synaptic plasticity in target neurons, adjusting synaptic efficacy in relevant learning networks (especially in the neostriatum and NAc).
Psychological implications:
- DA acts to “stamp in” associations between stimuli (S-S) or between stimuli and responses (S-R).
- Functions as a teaching signal for new learning or a computational prediction generator.

Schultz’s Electrophysiological Studies

DA activation occurs during reward anticipation through conditioned stimuli (CS) indicating that a reward will follow.
The activation of DA neurons correlates with prediction error models.
- Activation is contingent on the US reward being surprising.
- Fully predicted US rewards do not activate DA neurons as strongly.

Outstanding Questions

There is a general consensus that DA system activation often correlates with prediction error codes. However, the causative question remains:
- Does DA activation drive the rest of the brain towards learning?
- Does other system learning lead to DA activation?
- Is DA crucial in encoding US prediction errors for learning new stimuli?
- Is DA an output from learning mechanisms operating elsewhere in the brain?

Dopamine Learning Hypotheses

A framework of several interconnected hypotheses:

DA signals “stamp in” S-R or S-S associations post-reward.
DA activation promotes new habit formation and reinforces habit performance.
DA systems mediate computational teaching signals via US prediction errors.

Associative Stamping-In Hypothesis

A direct route for DA to influence reward; acts like a reinforcement signal that “stamps in” learned associations related to preceding reward stimuli or responses when a US reinforcer is presented (based on Thorndike's Law of Effect).
Thorndike’s Law of Effect states:
- If an instrumental response in the presence of a stimulus is followed by a satisfying event, the association between the stimulus (S) and response (R) strengthens; if followed by an undesirable event, the association weakens.
Reinforcers serve to “stamp in” the S-R association.

Supporting Evidence for Associative Stamping-In

Includes:
- Extinction-mimicry data that led to the anhedonia hypothesis.
- DA's modulation of mechanisms like long-term potentiation (LTP) and long-term depression (LTD).
- DA manipulations shortly succeeding a learning trial can impact memory consolidation.
- DA manipulations right before learning can affect new associations' acquisition.

Habit Learning

More specific than stamping-in: DA contributes to the learning of new S-R habits or modulates strength of learned S-R habits.
Definition of stronger habits: persistence in goal-directed responses after the goal becomes devalued (e.g., continuing to eat despite feeling full).

Supporting Evidence for Habit Learning

There is consensus that DA manipulations can influence performance strength across:
- Learned S-R habits.
- Non-learned action patterns (APs) - both instinctive and new, stereotyped APs.
However, habit strengthening contributions do not entirely clarify DA's role in reward.

Prediction Error Learning Models

DA is implicated in coding the prediction value associated with conditioned stimuli (CS) linked to rewards and the prediction errors from unconditioned stimuli (US).
Utilizes computational models from associative learning to assign roles to DA's phasic activations.
Prediction error and teaching signal constructs are distinguishing aspects of these models.

Prediction Error Definition

An update in information about a reward received at the moment of reward acquisition.
- Positive Prediction Error: True reward impact is greater than expected.
- Negative Prediction Error: True reward impact is less than expected.
Strong correlation between prediction errors and DA activation has been documented in various situations, including associative blocking and conditioned inhibition.

Rescorla-Wagner Model

This model delineates the trial-by-trial progression of simple associative learning.
Applied to DA suggests that DA boosts enhance predictions of impending rewards (V) associated with a CS.
Increase in the DA signal elevates the prediction error ($( ext{λ} - V)$) derived from the hedonic impact of the US ($ ext{λ}$) at the reinforcement moment.

DA and Prediction Errors

Hypothesis: DA acts as a teaching signal that gradually instructs learning systems to make correct predictions, incrementally and on a trial-by-trial basis.
Positions DA as a mediator of specific learning equation parameters.
- DA activity at the moment of CS may modulate the learned prediction strength of future rewards.
- DA activity following US delivery mediates the prediction error teaching signal, which reflects the contradiction between anticipated and actual rewards.

Application to Addiction

The hypothesis elucidates addiction causation through mechanisms of overlearning:
- Addictive substances lead to significant DA release and engender large prediction errors, causing an overlearning phenomenon that eventually leads to overly optimistic expectations regarding future drug-related rewards.
- Learned predictions of value (V) cannot adjust to accommodate the unusually high errors produced by DA-triggering substances, creating an expectation bias that persists, compelling addict behavior.

Evaluating Learning Models

DA has various indirect contributions to learning and performance, involving:
- Attention, motivation, cognition, rehearsal, consolidation.
Example: psychostimulants like amphetamines (AMPH) used as performance enhancers.
However, this does not infer that DA serves as a vital teaching signal or model for forming new reward associations.

Evaluating Direct Roles of DA

Divide the inquiry into separate components for experimental analysis:
1. Necessary Causation: Is DA necessary for standard reward learning?
2. Sufficient Causation: Can an increase in DA lead to excessive learning?
3. Prediction of Future Reward: Can DA cause a previously learned CS to generate exaggerated predictions?

Is DA a Necessary Cause?

If DA is essential for mediating learned associations, eliminating it should impair reward learning.
Recent findings with DA-deficient (DD) mice demonstrate:
- These mice lack tyrosine hydroxylase (TH), the enzyme responsible for DA synthesis, but manage to eat and drink under L-DOPA administration before lapsing back into inactivity.

Is DA a Necessary Cause? - Summary

Conclusion: DA appears unnecessary for standard reward learning, indicating it is not a necessary cause; if DA has any role as a teaching facilitator or stamping-in mechanism, it likely serves in a redundant capacity.

Is DA a Sufficient Cause?

Is an increase in DA sufficient to enhance teaching signals for better or faster learning about rewards?
Evidence from DAT-knockdown (mice with reduced DA transporter levels) indicates:
- Elevated DA levels (170% above controls) increase “wanting” but not “liking.”
- They do not expedite learning of S-S reward predictions or instrumental associations.
- They lack strong or persistent S-R habits.

Why Does DA Neuronal Firing Appear as Prediction Error?

Possible that DA neurons reflect learning signal consequences but do not induce learning.
DA signals processed by forebrain structures prior to DA neuronal response could explain this discrepancy.

Incentive Salience Hypothesis

Central premise: Reward is a composite entity with multiple components (wanting, liking, and learning).
DA is solely responsible for the “wanting” component:
- It adds incentive salience to reward-related stimuli, activating motivation to obtain the reward associated with those stimuli.
Significant in conditioning associations caused by Pavlovian learning, which bridge CSs to various rewards.

What is Incentive Salience Not?

Incentive salience is not hedonic “liking”; it is not merely a component of learning:
- “Wanting” needs to be separately assigned to make a reward into a “wanted” stimulus.
- Purely predicting a reward does not stimulate motivation to obtain it.

What is Incentive Salience?

A conditioned motivational response, typically triggered and assigned to reward stimuli.
Is more than just sensory representations or learned associations; it transforms neutral representations into motivationally potent stimuli.

Characteristics of Incentive Salience

Generated anew by mesolimbic systems whenever reward stimulus is encountered, hence motivation can fluctuate with current neurobiological conditions and learned associations.
This assigns greater attraction to rewards, turning neutral stimuli into motivational magnets.

Stages Involved in Attributing Incentive Salience

Stage 1: CS “wanting” assignment occurs based on the associated “liked” US; initially, CS is merely perceptual.
Stage 2: CS reboosting occurs where interactions between learning and physiological conditions strengthen IS assignment at later exposures.
Stage 3: Continuous generation of “wanting” to CS relying on learned context, but influenced by relevant physiological states.

Physiological State Inputs Interaction

Can augment the incentive value for natural rewards at all stages of IS attribution.
Interplay between learned incentives (CS) influences the motivation driven by physiological states related to the reward.

Testing Incentive Salience vs. Learning Hypotheses

Two supporting evidence streams:
1. Electrophysiological impacts of DA boosts on signals from limbic circuits.
2. Behavioral effects of DA boosts enhancing cue-triggered “wanting” for rewards in animal models.

DA Coding in the Ventral Pallidum (VP)

Examining coding in the VP clarifies the DA transmission purpose as it serves as a conclusive link for mesocorticolimbic reward circuits.

Empirical Impact of DA on Reward

DA dynamics modulate how reward-related behaviors and processes evolve but do not confirm if DA neuronal activity directly enhances prediction error signals.

Conclusion

The role of DA in reward entails:
- It influences action activation, sensorimotor initiatives, effort dynamics, and the strength of action patterns.
- It does not drive “liking” nor directly cause new learning.
- It does ascribe incentive salience to reward stimuli, influencing their motivational properties significantly.

Next Lecture: Addiction.