Reward

Definition and Fundamental Processes of Reward

Reward is defined as a natural process during which the brain associates diverse stimuli—which can include substances, situations, events, or activities—with a positive or desirable outcome. This process serves several critical functions in behavioral regulation:

Reward Options and Choices: The brain must evaluate various options to determine which will provide the most benefit.
Assessing Benefit: This involves weighing the potential gains of a stimulus against the effort or risk required to obtain it.
Decision Making: Reward processing facilitates making decisions across different timescales, categorized as: * Short Term Decisions: Immediate choices regarding current needs or desires. * Long Term Decisions: Strategic choices that may require delayed gratification for a greater eventual benefit.

Estimating Reward Value and Subjective Assessment

The process of choosing between options requires estimating reward value. This is illustrated through the hypothetical scenario of choosing a pudding at a restaurant. The evaluation process involves multiple variables:

Sensory Properties: Evaluating the physical characteristics of the reward, such as taste, smell, and texture.
Situational Context: External factors that influence value, such as the price of the item or the portion size offered.
Current Internal State: The physiological and psychological state of the individual (e.g., whether the person is still hungry or specifically craving something sweet).
Past Experience: Memories of similar contexts and previous outcomes (e.g., "How much did you enjoy chocolate cake last time?").

The integration of these factors leads to an Estimated Value, which directly informs the final decision.

The Orbitofrontal Cortex (OFC) and Subjective Value

The Orbitofrontal Cortex (OFC) is a major brain region responsible for estimating reward value. It acts as a hub for integrating information from various sources:

Medial Temporal Lobe Inputs: The OFC receives inputs from the hippocampus and adjacent regions involved in memory storage and retrieval. This provides the context of Prior Experience.
Dopaminergic Inputs: It receives inputs from reward-related dopamine neurons, which helps the brain form associations among specific objects, actions, and their subsequent consequences.
Sensory Integration: The OFC represents the reward value of food by integrating sensory inputs.
Behavioral Outputs: The OFC sends signals to systems that coordinate and execute decisions regarding behavior.

Evidence for Subjective Value in the OFC

Subjective value refers to how individuals evaluate the worth of an option based on personal preference and context. Research provides direct evidence of the OFC's role in this evaluation:

Monkey Satiety Study: When a monkey that likes peanuts is fed to the point of satiety, the sensory properties of the peanuts remain unchanged. However, the value of the peanuts to the monkey decreases.
Neuronal Firing Rates: This state of satiety is accompanied by a reduction in the firing rates of OFC neurons in response to more peanuts. This demonstrates that the OFC encodes the subjective value of food rather than just its sensory properties.
Human OFC Activity: Similarly, in humans, OFC neurons reduce their activity in response to a food that has been eaten to satiety, yet they continue to respond to new, different foods.

Early Evidence for a Reward Circuitry

The concept of a dedicated reward system emerged from behavioral and physiological studies in the mid-20th century:

Olds and Milner (1954): Identified reward centers in the brains of rodents using brain stimulation. This approach is known as brain stimulation reward (BSR).
Mapping the Circuitry: By changing the positioning of electrodes, researchers mapped the specific circuitry involved. They found that robust self-stimulation behavior occurred when electrodes were placed along the medial forebrain bundle (MFB), located within the pathway from the Ventral Tegmental Area (VTA) to the Nucleus Accumbens (NAcc).
Responsive Brain Areas: Self-stimulation was observed in the: * Ventral Tegmental Area (VTA) * Nucleus Accumbens (NAcc) * Cortical structures
Non-Responsive Areas: Electrodes placed in the corpus callosum or the hippocampus did not produce self-stimulation behavior, and rats did not spend time pressing levers for stimulation in these areas.

The Reward System and Dopaminergic Pathways

The reward system is a neural network consisting of multiple interacting circuits that receive and evaluate the rewarding properties of stimuli. Key components of the dopamine system include:

Ventral Tegmental Area (VTA): A primary source of dopamine neurons.
Nucleus Accumbens (NAcc): A major target for VTA projections involved in reward processing.
Orbitofrontal Cortex (OFC): Involved in value estimation and decision-making.

Major Dopamine Pathways

Mesolimbic/Mesocortical Pathway: Originates in the VTA and projects to the NAcc and various cortical regions (including the ventromedial prefrontal cortex).
Nigrostriatal Pathway: Projects from the substantia nigra pars compacta to the striatum, primarily involved in motor control but also interacting with reward systems.

Reward Prediction Error (RPE)

Dopamine neurons do not simply signal "pleasure"; instead, their responses encode Reward Prediction Error (RPE). RPE is the discrepancy between what an individual predicts will happen and what actually occurs.

The Formula: $RPE = ‘Actual Outcome’ - ‘Predicted Reward’$
Positive RPE: If the outcome is better than predicted, the RPE is positive, leading to reinforcement and an increased likelihood of choosing that option in the future.
Negative RPE: If the outcome is worse than predicted, the RPE is negative, reducing the motivation to repeat the behavior.

Dopamine Neuron Activity Patterns

Unexpected Reward: Dopamine cells respond to an unforeseen reward with a sharp burst of activity.
Expected Reward: When a reward repeatedly follows a cue (Conditioned Stimulus - CS), the dopamine burst shifts in time to coincide with the cue. Once the reward (R) is fully predicted, there is no dopamine response to the reward itself, only to the cue.
Missing Reward: If a cue is presented but the reward is not delivered, the initial dopamine burst at the cue is followed by a "downward blip" or a pause in activity at the time the reward should have appeared.

Reward and Food Regulation: Hedonic vs. Homeostatic

While eating is essential for survival and homeostasis (metabolic need), food intake is also heavily influenced by pleasurable effects.

Homeostasis: Eating to satisfy metabolic energy requirements.
Hedonic Feeding: Consuming food for the pleasure it provides, rather than for survival needs.
The ‘Liking’ Circuit: Also known as the hedonic circuit, this system generates the pleasure associated with consumption. The brain applies pleasure onto the sensation of taste as it enters the brain.

Phases of Eating and Reward

Appetitive Phase: Dominated by "Wanting." This involves initiating food procurement or foraging.
Consummatory Phase: Dominated by "Liking." This involves the actual engagement with and consumption of food.
Satiety Phase: Characterized by satiation and the termination of food intake.

Distinguishing ‘Wanting’ from ‘Liking’

Research by Berridge et al. establishes that reward consists of two distinct components: "Wanting" (motivation) and "Liking" (hedonic pleasure).

Measuring ‘Liking’ Reactions

These reactions are evolutionarily conserved and can be quantified through behavioral analysis:

Positive Facial Liking: Relaxed facial expressions and rhythmic tongue protrusions.
Facial Disliking/Aversive Expressions: Gapes (opening the mouth wide), turning away, and head shakes.

The Role of Dopamine: Motivation, Not Pleasure

Contrary to popular culture, dopamine is not the "pleasure chemical."

6-OHDA Studies: Rats treated with the neurotoxin 6-OHDA (which depletes dopamine neurons) become aphagic—they lack the motivation to seek food (Wanting). However, when food is placed in their mouths, they show normal "Liking" reactions to sucrose and normal aversive reactions to quinine.
Human Evidence: Lowering dopamine in humans does not reduce the pleasure of eating. Patients with Parkinson’s disease (characterized by low dopamine) do not report differences in food pleasure. Neuroimaging shows dopamine neurotransmission correlates with "wanting" ratings rather than "liking" ratings.
Transgenic Mice: Mice with increased dopamine show elevated "wanting" for sweets but no increase in "liking" reactions.

Neural Substrates of Liking: Opioids and Endocannabinoids

Pleasure is supported by separate systems involving opioids and endocannabinoids:

Opioid Signaling

Endogenous Opioids: Include neuropeptides such as enkephalins, dynorphins, and endorphins.
Receptor Subtypes: Mu ( $〲$ ), kappa, and delta receptors (all G-protein coupled receptors).
Pharmacology: Morphine (agonist) enhances sucking/liking responses; Naloxone (antagonist) decreases food intake, particularly sucrose.
Mu Opioid Hotspot: A specific area in the NAcc (roughly $10\%$ of the NAcc) where injection of an opioid agonist (DAMGO) specifically increases facial "liking" reactions.

Endocannabinoid Signaling

Endogenous Endocannabinoids: Lipid molecules such as anandamide and 2-arachidonoylglycerol (2-AG).
Receptor Subtypes: CB1 and CB2 (G-protein coupled receptors). CB1 is predominant in the CNS.
Interaction: Injection of anandamide into the NAcc increases facial liking. There is an endocannabinoid hotspot in the NAcc that overlaps with the opioid hotspot. CB1 receptors and Mu receptors co-localize on the same neurons.

Hedonic Hotspots and Circuit Activity

Liking is not localized to a single spot but is controlled by a network of interconnected "hotspots" (including the NAcc and other areas).

Functional Connectivity (Fos Plume Approach): To map these hotspots, researchers use Fos plume analysis. After injecting a drug (like DAMGO) into one region, they measure the expression of c-Fos, a protein that signals neuronal activation. This confirms that these hotspots are functionally connected; activation in one area leads to activity in others.
Circuit Recruitment: For a total increase in "liking" responses, the circuit must recruit multiple hotspots.
Disruption Experiment: If opioid signaling is blocked in one hotspot (via Naloxone) while another is stimulated (via DAMGO), the enhanced "liking" response is suppressed. This shows that the function of the circuit depends on the integrity of its interconnected parts.

Clinical Implications and Dysfunctions

Understanding these reward circuits is vital for addressing clinical disorders involving dysfunction in reward processing:

Addiction: Involves the subversion of the "wanting" and "liking" systems, often driven by dopamine pathways and glutamate inputs.
Eating Disorders: Can involve imbalances between metabolic needs (homeostasis) and hedonic drives.
Substance Specifics: Different drugs affect the circuit at various points (e.g., Opiates, Nicotine, Alcohol, Cocaine, Amphetamines, Cannabinoids, and Phencyclidine interact with VTA interneurons, projection neurons, or NAcc spiny neurons).