Turn of 20th century: two biological traditions gave rise to empirical learning theory.
Reflex tradition → Ivan Pavlov → classical (Pavlovian) conditioning.
Famous bell–food pairing ➔ dog salivates to bell.
Provided a method for analyzing associative mechanisms.
Comparative-psychology tradition → Edward Thorndike → instrumental/operant conditioning.
Puzzle-box experiments with cats: latch-opening → escape + food.
Led to focus on consequences of behavior.
1920s: John B. Watson demonstrates conditioned emotion in humans ("Little Albert" rat–noise → fear generalization to rabbit, dog, fur coat).
1930s: B. F. Skinner refines Thorndike → free-operant lever-press in rats; establishes operant chamber.
1950-60s: Behavior-therapy movement (e.g., Wolpe 1958) + applied behavior analysis; analogy to bench-to-bedside model in medicine.
1970-present: Broader cognitive & evolutionary perspectives; integration with social psychology; cognitive–behavior therapy (CBT).
Principles of learning constantly operate—"laws of learning are always in effect" (like gravity).
Learning processes contribute to:
Etiology & maintenance of psychiatric disorders (e.g., phobias, substance use, compulsions, maladaptive eating).
Mechanisms of therapeutic change (exposure, skills training, medication compliance).
Even medication management involves learning: acquiring knowledge of effects/side-effects, adherence routines, overcoming resistance.
Two foundational forms:
Pavlovian (classical) conditioning.
Operant (instrumental) conditioning.
Key Pavlovian terminology:
US (Unconditional Stimulus)
CS (Conditional Stimulus)
UR (Unconditional Response)
CR (Conditional Response)
Key Operant terminology:
Operant (response R)
Reinforcer (O; event increasing R)
Positive/negative reinforcement & punishment (Fig 3.3-1 logic).
Shared variables influencing both forms:
Magnitude & immediacy of US/reinforcer.
Extinction when US/reinforcer removed.
Contextual control & relapse phenomena.
CSs can elicit complex response “systems” (not rigid reflexes):
Preparatory physiology: gastric acid, insulin, arousal, temperature rise.
Behavioral approach/withdrawal depending on predicted valence.
Motivation of ongoing operants (Pavlovian-Instrumental Transfer).
Eating-related learning:
Flavor–nutrient preferences; flavor-illness aversion (chemo patients, alcohol sickness).
External cues trigger craving/overeating.
Drug-related learning:
Drug = US; paraphernalia/rooms = CSs.
Compensatory CRs opposite to drug UR (morphine pain ↑, alcohol temp ↑) ⇒ tolerance & overdose risk in novel contexts.
Unpleasant compensatory CRs yield negative reinforcement (“escape” drug-taking).
Not universal—cocaine often produces drug-like CR.
Anxiety/Fear:
CSs paired with shock evoke cardio-resp changes, analgesia, blinks, potentiated startle.
Panic disorder: external & interoceptive CSs to panic symptoms → anticipatory anxiety.
PTSD: CS retrieves both emotional & sensory aspects → flashbacks.
Pavlovian-Instrumental Transfer:
Fear CS increases avoidance responses; drug CS motivates drug seeking.
Conditioning depends on information value, not simple pairing.
Blocking: prior CS predicts US → second CS adds no info → no learning.
Relative contingency: P(US|CS) > P(US|\lnot CS) needed for excitation; reverse for inhibition.
Conditioned Inhibition: CS predicts "no US"; clinically may suppress pathological CR until inhibition lost.
Learning captured by Rescorla–Wagner rule: \Delta V_{CS}=\alpha\beta(\lambda-\Sigma V) (prediction-error driven; both positive & negative values).
Other modulators:
Salience, novelty (latent inhibition, US pre-exposure), attention, surprise.
Higher-order variants: sensory preconditioning (B→A; A→US ⇒ B elicits CR), second-order conditioning (A→US; A→B ⇒ B CR), intra-event associations (onset of panic predicting full attack).
Observational (vicarious) conditioning & evolutionary preparedness (snakes > flowers triage).
Extinction: repeated CS without US; basis of exposure therapy.
Counter-conditioning: CS paired with incompatible US (systematic desensitization).
Extinction & counter-conditioning = new inhibitory learning; original memory intact ⇒ relapse phenomena:
Spontaneous recovery (time).
Renewal (context switch), reinstatement (US re-exposure), rapid reacquisition.
Context defined broadly (Table 3.3-1): exteroceptive, internal drug/hormone states, moods, time, recent events.
State-dependent extinction: benzodiazepine- or alcohol-present extinction renews when sober; clinical caution re drug-assisted exposure.
Pharmacological enhancers of extinction (e.g., D-cycloserine acting at NMDA receptors) speed learning but do NOT prevent renewal.
Reconsolidation interference: Reactivate memory → protein-synthesis blocker (anisomycin) can attenuate CR; effects may still show recovery.
Practical direction: accept persistent memory; design multi-context, cue-retrieval, relapse-prevention strategies.
Extinction parallels classical: lever-press without pellets ⇒ decline, but with spontaneous recovery, renewal, reinstatement, rapid reacquisition.
Resurgence: extinguish old R, teach replacement R, then extinguish replacement ⇒ old R resurfaces.
Schedules of reinforcement:
Ratio (fixed, variable) → behavior contingent on response count; variable ratio yields high rates (slot machines).
Interval (fixed, variable) → first response after time; variable interval ⟶ steady moderate rates (email checking).
Quantitative law of effect: B1 = K\frac{R1}{R1+RO}
Increase behavior by ↑R1 or ↓RO.
Protective factor: rich alternative reinforcement (sports, hobbies) lowers drug/alcohol uptake.
Choice, delay discounting (Fig 3.3-5): value decreases hyperbolically with delay; impulsivity arises when immediate small reward outweighs larger delayed.
Commit early; augment delayed value; add costs to immediate reward for self-control interventions.
Skinner’s empirical definition: reinforcer = any consequence that increases response.
Premack Principle:
High-probability activity reinforces low-probability activity.
Preference test discovers idiosyncratic reinforcers.
Deprivation can invert hierarchy.
Token economies, social praise, attention as conditioned reinforcers.
Reinforcer shifts produce contrast effects.
Positive contrast (small→large) ↑ responding.
Negative contrast (large→small) ⇣ responding; involves frustration.
Partial-reinforcement extinction effect: intermittent history → greater persistence.
Incentive learning (Balleine 1992): motivational state invigorates action only after experiencing outcome in that state.
Clinical extrapolations: withdrawal must be paired with drug relief to motivate seeking; depressed patients must learn which activities elevate depressed mood.
Pavlovian CS → fear; instrumental R → removes CS/fear (negative reinforcement).
Clinical analogs: compulsive washing, agoraphobic avoidance, bulimic purging.
Species-Specific Defensive Reactions (SSDRs): innate actions (freeze, flee) learned rapidly without explicit reinforcement; show importance of Pavlovian control.
Learned helplessness: uncontrollable vs controllable shock alters future escape learning; currently viewed as stress-modulation by controllability.
Many operants are actually Pavlovian sign-tracking (pigeon key-peck, punished lever predicting shock).
Synthetic associative structure (Fig 3.3-6):
S-O (Pavlovian), R-O (goal-directed), S-R (habit), & S→(R-O) occasion setting.
Reinforcer devaluation (Fig 3.3-7): devalued food suppresses its linked action ⇒ evidence for R-O cognition.
Occasion setting: stimuli specify which R produces which O (noise vs light setting for lever vs chain contingencies).
Categorization studies: multiple exemplars improve generalization—train in varied contexts to maximize transfer.
Habit formation: extensive practice → S-R dominates; accelerated by drugs of abuse; but cognitive routes persist and can re-emerge given context/attention shifts.
Same relapse patterns as Pavlovian (spontaneous recovery, renewal, reinstatement, resurgence).
Extinction context specificity advises conducting therapy across multiple settings and incorporating retrieval cues.
Behavioral laws are value-neutral; evolutionarily adaptive mechanisms can generate maladaptive modern behaviors (overeating, addiction).
Therapists must assess whether problematic acts are respondent-elicited or operant-maintained to choose appropriate antecedent vs consequence interventions.
Pharmacological adjuncts (e.g., NMDA modulators) should be integrated with robust behavioral principles to avoid state-dependent pitfalls.
Prevention: enrich natural reinforcement environments (high R_O) to buffer against emergence of pathological operants.
Law of Effect decision matrix (Fig 3.3-1) conceptual.
Sign-tracking approach/withdrawal matrix (Fig 3.3-2).
Blocking & contingency demonstration (Fig 3.3-3).
Quantitative law of effect curve (Fig 3.3-4) & delay discounting curves (Fig 3.3-5).
Rescorla–Wagner equation: \Delta V_{CS}=\alpha\beta(\lambda-\Sigma V) .
Reinforcement choice equation: B1 = K \frac{R1}{R1+RO}.
Rescorla–Wagner (1972) foundation of error-driven learning.
Contemporary models include attention, memory priming, surprise (Pearce-Hall, Mackintosh, comparator hypotheses).
NMDA-dependent plasticity underlies extinction (D-cycloserine translational work by Davis et al., 2006).
Corticostriatal circuits differentiate goal-directed vs habitual control (Balleine & O’Doherty, 2010).
Inhibitory-learning model of exposure (Craske et al., 2014) integrates context & expectancy principles.
Contingency-management (Davis et al., 2016) applies quantitative law of effect to substance use treatment.
"ABC" of behavior analysis = Antecedent (Pavlovian S), Behavior (R), Consequence (O).
BLOCKING: "Old predictor blocks new pretender."
PREMACK: “Grandma’s Rule” – first eat veggies (low-prob), then ice-cream (high-prob).
EXTINCTION relapse acronym "SNiRReR": Spontaneous recovery, (context) Novel-renewal, Reinstatement, Rapid reacquisition.
\Delta V_{CS}=\alpha\beta(\lambda-\Sigma V) (Rescorla–Wagner)
B1 = K \frac{R1}{R1+RO} (Quantitative law of effect)
Hyperbolic delay discounting: V=\frac{A}{1+kD} where A = amount, D = delay, k = discount rate.
Identify if presenting behavior is respondent (CS-elicited) or operant (consequence-maintained).
Map antecedent CSs, operant Rs, outcomes Os, and context factors.
Consider potential blocking/inhibition histories that may influence current learning capacity.
During exposure:
Vary contexts; use expectancy violation; monitor spontaneous recovery.
During reinforcement-based interventions:
Schedule reinforcement (variable ratio for acquisition, thinning gradually).
Bolster alternative reinforcers to manipulate R_O.
Use Premack assessments to individualize rewards.
Address habits:
Increase mindfulness/attention to shift S-R back to R-O control.
Introduce costs/punishers to degrade habit value.
Plan relapse-prevention by embedding retrieval cues and practicing in high-risk contexts.