The Role of Dopamine in Reward
The Causal Question
Dopamine in Reward
The main inquiry of the reading is to answer the question:
“What does DA do in reward?”
There are 3 competing areas of explanation for DA’s role in reward:
“Liking”
Learning
“Wanting”
Approaches to the Causal Question
So which explanation is the correct answer? Well, the answer has been approached in several experimental ways. It comes down to how we assign causality to a brain event:
|
|
|
Possible Answers to the Causal Question
Unsupported Hypotheses
Activation-Sensorimotor Hypothesis
The activation-sensorimotor hypothesis posits that DA mediates general functions of:
Action generation
Effort
Movement
General arousal or behavioural activation
This is supported by substantial evidence: however, this is very general in scope, making it difficult to explain specific aspects of reward.
Hedonia Hypothesis
Evidence in Favour of the Hedonia Hypothesis
The hedonia hypothesis suggests that DA in NAc is a “pleasure neurotransmitter”.
It mediates the positive reinforcing effects of reward stimuli
In a hedonic reward sense of the term “reinforcement”
The suppression of DA causes anhedonia
Evidence Against the Hedonia Hypothesis
However, DA ≠ hedonic reactions (in either rats or humans).
DA reduction does not decrease “liking”
6-OHDA lesions have no effect
Though they destroy up to 99% of DA in NAc and neostriatum
Neuroleptic drugs (like pimozide) do not shift reactions towards “disliking”
DA neurons in monkeys stop firing to rewards after prediction is learned
Whatever hedonic impact of reward is mediated without a DA signal
…
Reward Learning Hypothesis
DA signal modulates synaptic plasticity in target neurons
Or adjusts synaptic efficacy in appropriate neuronal circuits of input layers of learning networks
Particulalry neostriatum and Nac
Psychologically it suggests that DA acts to “stamp in” links between S-S or S-R events
Acts as a teaching signal for new learning or a computational prediction generator
Schultz’s Electrophys. Studies
monkeys sat down for insturmental condiitoning task, VTA measured (originating source of dopamine production, espcially mesocorticolimbic DA pathway), had to move joystick for “correct stimulus”.
They started by guessing, not knowing the rules of the game per se, but when they got it right they got juice. After juice was recevied, DA neurons activated. As trials continued, monkeys learned rules of the game and had accurate responses. So initially, DA was activated after juice. Then, DA was activated during the salient reward-predicting cues.
And extinction trials (in which reward wasn’t given), VTA activity reduced below baseline (related to prediction error models — CS is no longer preditctive of reward).
First Trial:
Outstanding Questions
The Three Dopamine Learning Hypotheses
A set of different but closely related hypotheses
All posit DA mediates learning but in different ways
DA signals “stamp in” S-R or S-S associations whenever a reward follows
Simplest
2. DA activation causes new habit learning and
enhances habit performance
3. – Most sophisticated
DA systems mediate computational teaching
signals via US prediction errors
Associative Stamping-In?
Thorndike’s “Law of Effect”
If a response in the presence of a stimulus is followed by a satisfying event, association between the stimulus (S) and the response (R) is strengthened
If a response is followed by an undesirable event, the S-R association is weakened
Reinforcer (O) serves to ‘stamp-in’ the S-R association
Notes:
Motivation for instrumental behaviour:
Activation of the S-R association upon exposure to contextual stimuli (S), in the presence of which the response was previously reinforced
The resulting event is not part of the association
The satisfying or annoying consequence serves to strengthen or weaken the S-R association
No learning about ‘O’ or ‘S-O’ or ‘R-O’
Supporting Evidence
Dopamine appears to strengthen learning procedures in different ways.
S-R Learning
Notes above.
Habit Learning
Implies more proactive learning dopamine involvement. Habits are formed (rpboabilistic dispaly of certain behaviours over others).
Prediction Error Learning Models
try to predict what’s gonna happen next — DA mediates prediction value carried by a CS previously associated with reward (or whenever it is suprising).
Evaluating Direct Roles of DA
Is DA A Necessary Cause?
Incentive Salience Hypothesis
learnign translate into behavioural output — dopamine causes this motivational response. If the monkeys is Shultz study didn’t like the juice, they would still be able to predict when the juice would be available, but you woulnd’t see the dopamine spike (e.g, the wanting/motivation to get the juice)
DA can make a CS more sought out. If you’re hungry and want food, you’ll pay more attention to your environment for cues that might predict food (i.e., more likely to notice food-related cues).
initial learning, reboosting (happens with each encounter of CS to make us want to the reward a little more with each CS exposure), then wanting — liking and wanting will often align in terms of stages of incentive salience (temporally speaking)
Stages of the Incentive Salience Hypothesis
Evidence in Favour of the Incentive Salience Hypothesis
The Causal Question
Core Inquiry: What does dopamine (DA) do in the context of reward?
Contains 3 competing explanations:
“Liking”
Learning
“Wanting”
Approaches to answering this question include several experimental methods.
Experimental Approaches
Evaluating specific reward functions lost when DA neurotransmission is suppressed:
Techniques: antagonists, neurotoxins, lesions.
Focus: Necessary causes for reward.
Evaluating reward functions enhanced by increased DA signaling:
Techniques: agonists, brain stimulation, genetically induced hyper-DA mutations.
Focus: Sufficient causes for reward.
Investigating reward functions coded by DA neural activations during reward events:
Focus: Neural coding of function via correlation.
Emphasizes that DA function is multifaceted, and combining these approaches is beneficial.
Possible Answers to DA's Role in Reward
Activation-sensorimotor hypotheses (effort, arousal, and response vigor).
Hedonia hypothesis (pleasure linked to rewards).
Reward learning hypotheses (associative stamping-in, teaching signals, prediction errors).
Incentive salience hypothesis (the “wanting” aspect of rewards).
Activation-Sensorimotor Hypothesis
DA mediates several general functions:
Action generation
Effort
Movement
General arousal or behavioral activation.
This hypothesis is well-supported by substantial evidence but is too broad to explain specific reward mechanisms.
Hedonia Hypothesis
DA in the nucleus accumbens (NAc) is suggested to function as a “pleasure neurotransmitter” that mediates the positive reinforcing effects of reward stimuli.
Interprets reinforcement hedonic sense as follows:
Suppression of DA leads to anhedonia (the absence of pleasure).
Evidence Against the Hedonia Hypothesis
DA Reduction Evidence:
DA reduction does not decrease “liking” in rats;
Example: 6-OHDA lesions may destroy up to 99% of DA yet have no effect.
Neuroleptic drugs (like pimozide) do not shift reactions towards “disliking.”
DA neurons stop firing to rewards after prediction has been learned in monkeys.
Conclusion: The hedonic impact of reward seems to be mediated without DA signaling.
DA Impact on Hedonic Reactions in Rats
DA activation does not enhance “liking.” Evaluations show:
Hyper-DA mutation (like DAT-KO mice) does not increase “liking.”
Amphetamine microinjection into NAc does not increase hedonic potency.
Sensitization and electrical brain stimulation did not enhance hedonic impact of reward.
DA Impact on Hedonic Reactions in Humans
Patients with Parkinson's Disease (PD):
They demonstrate normal ratings of liking.
However, individuals with DA dysregulation syndrome (DDS) show increased “wanting” ([DDS] characterized by compulsive activities and increased L-DOPA intake).
The advantage of studying individuals with DDS is that it avoids confounds typically seen in drug addicts;
L-DOPA does not induce euphoric effects or dysphoric withdrawal.
Summary of Evidence Against Hedonia Hypothesis
DA does not produce normal “liking” reactions in rats or humans.
Activation increases in DA have not been shown to amplify hedonic impact when “wanting” is separated from “liking.”
Main contributions of DA must mediate nonhedonic aspects of reward; moves to consideration of nonhedonic hypotheses (Reward learning, Incentive salience).
Reward Learning Hypothesis
Proposes that DA signals modulate synaptic plasticity in target neurons, adjusting synaptic efficacy in relevant learning networks (especially in the neostriatum and NAc).
Psychological implications:
DA acts to “stamp in” associations between stimuli (S-S) or between stimuli and responses (S-R).
Functions as a teaching signal for new learning or a computational prediction generator.
Schultz’s Electrophysiological Studies
DA activation occurs during reward anticipation through conditioned stimuli (CS) indicating that a reward will follow.
The activation of DA neurons correlates with prediction error models.
Activation is contingent on the US reward being surprising.
Fully predicted US rewards do not activate DA neurons as strongly.
Outstanding Questions
There is a general consensus that DA system activation often correlates with prediction error codes. However, the causative question remains:
Does DA activation drive the rest of the brain towards learning?
Does other system learning lead to DA activation?
Is DA crucial in encoding US prediction errors for learning new stimuli?
Is DA an output from learning mechanisms operating elsewhere in the brain?
Dopamine Learning Hypotheses
A framework of several interconnected hypotheses:
DA signals “stamp in” S-R or S-S associations post-reward.
DA activation promotes new habit formation and reinforces habit performance.
DA systems mediate computational teaching signals via US prediction errors.
Associative Stamping-In Hypothesis
A direct route for DA to influence reward; acts like a reinforcement signal that “stamps in” learned associations related to preceding reward stimuli or responses when a US reinforcer is presented (based on Thorndike's Law of Effect).
Thorndike’s Law of Effect states:
If an instrumental response in the presence of a stimulus is followed by a satisfying event, the association between the stimulus (S) and response (R) strengthens; if followed by an undesirable event, the association weakens.
Reinforcers serve to “stamp in” the S-R association.
Supporting Evidence for Associative Stamping-In
Includes:
Extinction-mimicry data that led to the anhedonia hypothesis.
DA's modulation of mechanisms like long-term potentiation (LTP) and long-term depression (LTD).
DA manipulations shortly succeeding a learning trial can impact memory consolidation.
DA manipulations right before learning can affect new associations' acquisition.
Habit Learning
More specific than stamping-in: DA contributes to the learning of new S-R habits or modulates strength of learned S-R habits.
Definition of stronger habits: persistence in goal-directed responses after the goal becomes devalued (e.g., continuing to eat despite feeling full).
Supporting Evidence for Habit Learning
There is consensus that DA manipulations can influence performance strength across:
Learned S-R habits.
Non-learned action patterns (APs) - both instinctive and new, stereotyped APs.
However, habit strengthening contributions do not entirely clarify DA's role in reward.
Prediction Error Learning Models
DA is implicated in coding the prediction value associated with conditioned stimuli (CS) linked to rewards and the prediction errors from unconditioned stimuli (US).
Utilizes computational models from associative learning to assign roles to DA's phasic activations.
Prediction error and teaching signal constructs are distinguishing aspects of these models.
Prediction Error Definition
An update in information about a reward received at the moment of reward acquisition.
Positive Prediction Error: True reward impact is greater than expected.
Negative Prediction Error: True reward impact is less than expected.
Strong correlation between prediction errors and DA activation has been documented in various situations, including associative blocking and conditioned inhibition.
Rescorla-Wagner Model
This model delineates the trial-by-trial progression of simple associative learning.
Applied to DA suggests that DA boosts enhance predictions of impending rewards (V) associated with a CS.
Increase in the DA signal elevates the prediction error ($( ext{λ} - V)$) derived from the hedonic impact of the US ($ ext{λ}$) at the reinforcement moment.
DA and Prediction Errors
Hypothesis: DA acts as a teaching signal that gradually instructs learning systems to make correct predictions, incrementally and on a trial-by-trial basis.
Positions DA as a mediator of specific learning equation parameters.
DA activity at the moment of CS may modulate the learned prediction strength of future rewards.
DA activity following US delivery mediates the prediction error teaching signal, which reflects the contradiction between anticipated and actual rewards.
Application to Addiction
The hypothesis elucidates addiction causation through mechanisms of overlearning:
Addictive substances lead to significant DA release and engender large prediction errors, causing an overlearning phenomenon that eventually leads to overly optimistic expectations regarding future drug-related rewards.
Learned predictions of value (V) cannot adjust to accommodate the unusually high errors produced by DA-triggering substances, creating an expectation bias that persists, compelling addict behavior.
Evaluating Learning Models
DA has various indirect contributions to learning and performance, involving:
Attention, motivation, cognition, rehearsal, consolidation.
Example: psychostimulants like amphetamines (AMPH) used as performance enhancers.
However, this does not infer that DA serves as a vital teaching signal or model for forming new reward associations.
Evaluating Direct Roles of DA
Divide the inquiry into separate components for experimental analysis:
Necessary Causation: Is DA necessary for standard reward learning?
Sufficient Causation: Can an increase in DA lead to excessive learning?
Prediction of Future Reward: Can DA cause a previously learned CS to generate exaggerated predictions?
Is DA a Necessary Cause?
If DA is essential for mediating learned associations, eliminating it should impair reward learning.
Recent findings with DA-deficient (DD) mice demonstrate:
These mice lack tyrosine hydroxylase (TH), the enzyme responsible for DA synthesis, but manage to eat and drink under L-DOPA administration before lapsing back into inactivity.
Is DA a Necessary Cause? - Summary
Conclusion: DA appears unnecessary for standard reward learning, indicating it is not a necessary cause; if DA has any role as a teaching facilitator or stamping-in mechanism, it likely serves in a redundant capacity.
Is DA a Sufficient Cause?
Is an increase in DA sufficient to enhance teaching signals for better or faster learning about rewards?
Evidence from DAT-knockdown (mice with reduced DA transporter levels) indicates:
Elevated DA levels (170% above controls) increase “wanting” but not “liking.”
They do not expedite learning of S-S reward predictions or instrumental associations.
They lack strong or persistent S-R habits.
Why Does DA Neuronal Firing Appear as Prediction Error?
Possible that DA neurons reflect learning signal consequences but do not induce learning.
DA signals processed by forebrain structures prior to DA neuronal response could explain this discrepancy.
Incentive Salience Hypothesis
Central premise: Reward is a composite entity with multiple components (wanting, liking, and learning).
DA is solely responsible for the “wanting” component:
It adds incentive salience to reward-related stimuli, activating motivation to obtain the reward associated with those stimuli.
Significant in conditioning associations caused by Pavlovian learning, which bridge CSs to various rewards.
What is Incentive Salience Not?
Incentive salience is not hedonic “liking”; it is not merely a component of learning:
“Wanting” needs to be separately assigned to make a reward into a “wanted” stimulus.
Purely predicting a reward does not stimulate motivation to obtain it.
What is Incentive Salience?
A conditioned motivational response, typically triggered and assigned to reward stimuli.
Is more than just sensory representations or learned associations; it transforms neutral representations into motivationally potent stimuli.
Characteristics of Incentive Salience
Generated anew by mesolimbic systems whenever reward stimulus is encountered, hence motivation can fluctuate with current neurobiological conditions and learned associations.
This assigns greater attraction to rewards, turning neutral stimuli into motivational magnets.
Stages Involved in Attributing Incentive Salience
Stage 1: CS “wanting” assignment occurs based on the associated “liked” US; initially, CS is merely perceptual.
Stage 2: CS reboosting occurs where interactions between learning and physiological conditions strengthen IS assignment at later exposures.
Stage 3: Continuous generation of “wanting” to CS relying on learned context, but influenced by relevant physiological states.
Physiological State Inputs Interaction
Can augment the incentive value for natural rewards at all stages of IS attribution.
Interplay between learned incentives (CS) influences the motivation driven by physiological states related to the reward.
Testing Incentive Salience vs. Learning Hypotheses
Two supporting evidence streams:
Electrophysiological impacts of DA boosts on signals from limbic circuits.
Behavioral effects of DA boosts enhancing cue-triggered “wanting” for rewards in animal models.
DA Coding in the Ventral Pallidum (VP)
Examining coding in the VP clarifies the DA transmission purpose as it serves as a conclusive link for mesocorticolimbic reward circuits.
Empirical Impact of DA on Reward
DA dynamics modulate how reward-related behaviors and processes evolve but do not confirm if DA neuronal activity directly enhances prediction error signals.
Conclusion
The role of DA in reward entails:
It influences action activation, sensorimotor initiatives, effort dynamics, and the strength of action patterns.
It does not drive “liking” nor directly cause new learning.
It does ascribe incentive salience to reward stimuli, influencing their motivational properties significantly.
Next Lecture: Addiction.