Exhaustive Guide to Reinforcement, Extinction, and Operant Contingencies

Neuroscience of Habits and the Three-Term Contingency in Biological Systems

Behavior analysts and neuroscientists define habits as automatic behavioral routines established through repeated sequences of "cue–routine–reward."
The fundamental basis for this neuroscience is the three-term contingency: $SD : R \rightarrow Sr$ .
In experimental rat studies using maze-learning trials, a click (Cue) is followed by a choice point (Routine) where the rat discovers chocolate (Reward) if it turns left.
Early Learning Phase: When first navigating a maze, a rat emits high levels of exploratory responses, such as sniffing and rising. Brain activity, specifically in the basal ganglia (implicated in motor learning), remains high throughout this period.
Habit Formation (Chunking): As trials are repeated, exploratory responses cease. The rat's brain pattern shows a high spike in neural activity in the basal ganglia before the cue, a low phase during the routine, and a second spike during reward consumption. This pattern is termed "chunking" and indicates an automatic neural habit.
Human Application: Routine actions, such as getting out of bed, are automatic and lack mindfulness. Elite athletes are cautioned not to "over-think," as their complex chains of behavior have become neural chunks through practice.

Neurobiology of Operant Learning in Drosophila

Learning components are broken into two distinct parts by neurobiologists: - Behavior-Consequence Learning (BCL): Learning about the consequences of behavior ( $R \rightarrow Sr$ ). - Stimulus-Relation Learning (SRL): Learning about the relation between the discriminative stimulus and reinforcement ( $SD : Sr$ ).
The Fly Flight Simulator: Designed by Dr. Bjorn Brembs, a fruit fly is tethered to a torque meter in a cylindrical drum. BCL is isolated by making positive torque values (e.g., right turns) produce heat punishment without visual or auditory cues. SRL is studied by making visual signals proportional to the fly's yaw-torque.
The Role of the FoxP Gene: - One form, $FoxP2$ , is essential for human speech and motor learning. - Research using mutations or reduced gene expression indicates the $dFoxP$ orthologue is necessary specifically for BCL (operant self-learning) but not for SRL. - Implication: BCL may be an evolutionary ancestral capacity (exaptation) underlying human language evolution.

The Four Basic Contingencies of Reinforcement

Contingencies are defined by the environmental operation (presentation or removal) and the effect on behavior (increase or decrease).
Positive Reinforcement: A stimulus follows a behavior, and as a result, the rate of that behavior increases. (Example: Praise for sharing a toy leads to more frequent sharing).
Negative Reinforcement: An operant removes an event, and the procedure increases the rate of response. (Example: Putting on sunglasses to remove the sun’s glare; picking up a crying baby to stop the crying).
Positive Punishment: An operant produces an event and the rate of behavior decreases. (Example: Bombing an enemy for attacking an ally stops hostile actions; a mother scolding a child for matches leads to decreased match-play).
Negative Punishment: The removal of a stimulus contingent on behavior results in a decrease in operant frequency. (Example: Turning off the television due to an argument leads to less fighting; a student leaving the room for passing notes).

Reward and Intrinsic Motivation Meta-Analysis

The Concern: Some social psychologists argue that rewards (experienced as controlling) undermine self-determination and intrinsic motivation (e.g., Deci, Koestner, & Ryan, 1999).
The Objective Analysis: Eisenberger and Cameron (1996) and Cameron et al. (2001) conducted a meta-analysis of 145 experiments.
Key Findings: - Verbal rewards (praise, positive feedback) increase performance and interest. - Tangible rewards (money, points) increase interest in tasks that are initially uninteresting or boring. - Rewards tied to high performance, progressive mastery, or exceeding others' performance maintain or enhance intrinsic interest. - Rewards loosely tied to simple "showing up" or repetitive jobs can produce slight decreases in motivation. - Overall, evidence suggests rewards do not have pervasive negative effects when linked to mastery.

Methods for Identifying Reinforcers

The Litmus Test Approach: A consequence is defined as a positive reinforcer only if it is shown through testing to increase behavior (e.g., a $100 payout increasing gambling behavior).
The Premack Principle: Proposed by David Premack (1959), this states that a higher-frequency behavior will function as reinforcement for a lower-frequency behavior ( $behavior_{operant} \rightarrow behavior_{Sr}$ ). - Rat Study (1962): When water-deprived rats preferred drinking over running, drinking reinforced running. When given free water access and they preferred running, running reinforced drinking. - Human Example: Making television watching (high-frequency) contingent on finishing homework (low-frequency).
Response Deprivation Hypothesis: Proposed by Timberlake and Allison (1974), this suggests reinforcement occurs specifically because access to a behavior is restricted below its baseline level. This is described as Equilibrium Analysis.

Operant Conditioning History and Neuroscience

Thorndike's Law of Effect (1911): Based on puzzle box experiments with cats, dogs, and chicks. He measured latency (time to escape), which decreased over trials. Skinner criticized Thorndike’s focus on "trial and error," preferring rate of response as the primary datum.
In-Vitro Reinforcement (IVR): Research by Stein and Belluzzi (2014) shows that individual neurons (CA1 pyramidal cells) can be operantly conditioned. Bursts of activity are reinforced by dopamine agonists applied for $50\,ms$ .
In-Vivo Conditioning: Research with Japanese monkeys ( $Macaca fuscata$ ) shows that neurons in the lateral prefrontal cortex (LPFC) increase firing when reinforced with juice, showing behavioral flexibility at the cellular level.

Procedural Methodology in Operant Analysis

Operant Rate: The basic measure of behavior probability (responses per unit of time).
Free-Operant Method: An apparatus where a response takes little time and leaves the subject ready to respond again, allowing for moment-to-moment frequency changes.
Motivation and Deprivation: Typically, research animals are maintained at $85\%$ of their free-feeding body weight. Deprivation activates energy-related hormones like insulin, leptin, and ghrelin, modulating dopamine circuitry.
Magazine Training: Associating the click of a feeder with food to establish the sound as a conditioned reinforcer.
Operant Class: Defined by the effect of behavior (e.g., any paw press that closes a switch).
Shaping (Successive Approximation): Reinforcing closer and closer approximations of a target behavior while extinguishing previous forms. This makes use of behavioral variability, the "clay" for behavioral selection.

Behavioral Neuroscience of Birdsong

Learning Phases: - Sensory Learning: Nestlings hear tutors; involves mirror-neuron activation. - Sensorimotor Learning: Young birds hear themselves and fine-tune their song toward an adult "crystallized" version via auditory feedback.
Neural Circuitry: Involves the anterior forebrain pathway (AFP), the high vocal center (HVC), and the lateral magnocellular nucleus of anterior nidopallium (LMAN). LMAN is associated with error correction compared to the "song template."

Reinforcement and Problem Solving

Response Stereotypy: Barry Schwartz (1982) argued reinforcement causes rigid behavior based on matrix-task studies where students repeated a single correct order of $4$ left and $4$ right key presses.
Response Variability: Allen Neuringer argued variability is an operant dimension. In Lag schedules, pigeons were reinforced only if an $8$ -peck sequence differed from previous trials. Results showed that birds can respond nearly randomly if the contingency requires it.
Conclusion: "What you reinforce is what you get." Contingencies can generate either rigid stereotypy or novel, creative sequences.

The Process and Effects of Extinction

Definition: Withholding reinforcement for a previously reinforced response, resulting in a zero probability of reinforcement.
Extinction Burst: An initial increase in the rate of response upon the withdrawal of reinforcement.
Operant Variability: Behavior becomes increasingly variable as organisms search for reinforcement reinstatement (e.g., Antonitis rat nose-poking across a $50\,cm$ slot).
Increased Force: Response force becomes more variable and sometimes increases (e.g., smashing an elevator button when the elevator doesn't arrive).
Emotional Responses and Aggression: Attacks often occur when reinforcement stops (e.g., people hitting vending machines; pigeons attacking "target" birds during extinction).
Discriminated Extinction ( $S\Delta$ ): A specific stimulus (like an "Out of Order" sign) signals the onset of extinction.
Resistance to Extinction: The number of responses emitted after reinforcement is removed. Maximum resistance usually reaches a peak after $50$ to $80$ reinforced responses.
Partial Reinforcement Effect (PRE): Intermittent reinforcement schedules generate significantly higher resistance to extinction than continuous reinforcement (CRF), partly due to decreased discriminability between conditions.

Behavioral Persistence and Recovery

Spontaneous Recovery: The recovery of response rate above operant level at the start of a new extinction session, often triggered by the handle of the chamber or initial stimulation.
Reinstatement: The recovery of behavior when the reinforcer is presented alone (response-independent) after extinction has occurred.
Renewal: The recovery of responding when the organism is removed from the extinction context (e.g., ABA renewal where responding returns when the rat is placed back in Context A). - Practical Implication: Drug relapse may happen when a user leaves a treatment center (extinction context) and returns to their home (reinforcement context).
Forgetting vs. Extinction: Extinction is a reduction based on non-reinforcement while the opportunity remains; forgetting is a reduction based on the passage of time without the opportunity to behave.

Aversive Control: Punishment and Negative Reinforcement

Aversive Stimuli: Events organisms escape, evade, or avoid. Primary aversive stimuli are phylogeny-based (insect stings, loud noises). Conditioned aversive stimuli ( $Save$ ) are learned (reprimands, failing grades).
Escape vs. Avoidance: Escape removes an ongoing stimulus while avoidance prevents it. They exist on a continuum defined by the Shock-Shock ( $S-S$ ) and Response-Shock ( $R-S$ ) intervals.
Sidman (Nondiscriminated) Avoidance: Shocks occur periodically unless a response delays them. There is no warning signal. This is inherently cyclical; if avoidance is effective, the absence of shocks eventually leads to less responding until a shock reinstates the behavior.
Negative Reinforcement in Caregiving: Infant crying acts as a negative reinforcer for caregivers. Parental behavior (rocking, feeding) is negatively reinforced when crying stops.

Maximizing Punishment Effectiveness

Abrupt Introduction: Introduce punishment at full intensity immediately. Gradual increases create "masochism" or disregard for the punisher.
Intensity: Moderate to high intensity is required for permanent suppression.
Immediacy: Direct contiguity between the response and the punisher is crucial.
Schedule: Continuous punishment (FR 1) is most effective.
Satiation of Positive Reinforcement: Punishment is more effective when the deprivation for the positive reinforcer maintaining the behavior is low.
Response Alternatives: Punishment works best when the subject has an unpunished alternative way to reach reinforcement.

Questions & Discussion

What defines a three-term contingency? It includes the discriminative stimulus ( $SD$ ), the operant ( $R$ ), and the reinforcing stimulus ( $Sr$ ).
Comparison of BCL and SRL: BCL ( $R \rightarrow Sr$ ) is found to be biologically unique from SRL ( $SD : Sr$ ) in Drosophila genetics.
The Premack Principle application: If a child prefers running to water, can running reinforce drinking? Yes, the Premack principle handles relative frequencies of behavior, not just discrete stimuli.
Difference between extinction and negative punishment? Extinction involves the failure of a response to produce a reinforcer; negative punishment is the contingency-driven removal of an existing reinforcer.
What is the effect of the FoxP gene? It is essential for Behavior-Consequence Learning (BCL) but not Stimulus-Relation Learning (SRL).