Reinforcement and Operant Conditioning Notes
Operant Learning and Reinforcement
- Operant: a class of behaviour that operates on the environment to produce a common environmental consequence.
- Learning: a change in behaviour due to experience.
- Operant Learning: a change in a class of behaviour as a function of the consequences that followed it.
- Learning = Conditioning.
Effect of Consequences
- Reinforcement increases behaviour when the consequence follows the behaviour.
- Punishment decreases behaviour when the consequence follows the behaviour.
- Descriptions:
- Increase: Reinforce
- Decrease: Punish
Effects of Reinforcing Consequences
- Reinforcement effects on behaviour include:
- Increase in frequency
- Increase in duration
- Increase in intensity
- Increase in quickness (i.e., decrease in latency)
- Increase in variability
- Reinforcement leads to an increase in whatever the reinforcer is contingent on.
Two Ways of Reinforcing (Positive vs Negative)
- Positive Reinforcement: add a stimulus to increase a behaviour.
- Negative Reinforcement: remove a stimulus to increase a behaviour.
- These are the two fundamental routes to reinforcement.
Rewards vs. Reinforcers
- Distinction:
- Reinforcer: a consequence that increases the probability of a behaviour.
- Reward: outcomes that may or may not function as reinforcers; not a technical term in the reinforcement framework by itself.
Reinforcers Maintain Behaviour
- Graphical idea: reinforcement can maintain responding over time; increases in response probability can rise toward an asymptote depending on the reinforcer and conditions.
- Key concept: conditioning trials shape performance toward a learned asymptote based on reinforcement history.
Notes about Reinforcement (Functional Description)
- Reinforcement is a functional description, not a theory.
- It is not circular: we say the consequence functioned as a reinforcer for the response, not that the consequence simply increased the probability because it was reinforcing.
- Correct Usage examples:
- “The consequence (e.g., food) functioned as a reinforcer for the response (e.g., lever pressing).”
- “The consequence (e.g., food) reinforced the response (e.g., lever pressing).”
- Incorrect usage would imply the consequence inherently increased probability without specifying its function as a reinforcer.
Types of Reinforcers
- Two main types:
1) Unconditional (Primary) Reinforcer:
- Properties derived from species’ evolutionary history (phylogenetic importance).
- Examples: food, sex, water, sleep, social interaction, certain sensory stimulation, escape from harm (e.g., extreme heat).
- Usually depends on some deprivation; often species-specific.
2) Conditional (Secondary) Reinforcer: - Neutral stimuli/events that acquire reinforcing power via a contingent relationship with primary reinforcers or other conditioned reinforcers.
- Key idea: secondary reinforcers gain value because they signal or enable access to primary rewards.
Empirical Examples of Reinforcement Concepts
- Liberman et al. (1973): Differential reinforcement of incompatible behaviours
- Incompatible behaviours: e.g., “rational talk” vs “irrational talk.”
- Reinforced rational talk; extinguished irrational talk when it ceased receiving reinforcement.
- Sullivan & Leon (1987): conditioned reinforcement in a perceptual/olfactory paradigm
- Pups trained with peppermint odor conditioning showed differences in time spent near peppermint and in olfactory bulb activity (2-DG uptake) depending on conditioning groups.
- Illustrates how neutral odors can acquire reinforcing properties through conditioning.
Delay Reduction Theory (Conditional Reinforcement)
- Translation: The effectiveness of a stimulus as a conditional reinforcer is determined by the degree to which it is correlated with a reduction in the delay to terminal reinforcement.
- Note: You do not need to memorize every detail on every slide, but understand the core idea that conditional reinforcers gain strength by predicting shorter delays to obtaining the primary reinforcement.
- Formula (hyperbolic decay form shown in slides):
V = rac{A}{1 + K D}
- where:
- V = value of the reinforcer (reinforcing effectiveness)
- A = maximum reinforcing value (asymptote)
- K = rate of discounting (how quickly value declines with delay)
- D = delay to terminal reinforcement
Conditional Reinforcement in a Choice Task
- Example (pigeons):
- Key A provided food 20% of the time plus a conditioned reinforcer (CR).
- Key B provided food 50% of the time but no CR.
- Result: pigeons preferred Key A despite less actual food, due to the presence of the conditioned reinforcer associated with Key A.
- Interpretation: The conditioned reinforcer can drive choice when its predictive value for the primary reinforcement is high enough, illustrating the power of conditional reinforcement.
Variables Affecting Reinforcement
- Contingency: degree of correlation between a behaviour and its consequence.
- Contiguity: nearness in time (temporal contiguity) or space (spatial contiguity) between the operant response and the reinforcer.
- High contiguity strengthens the effectiveness of the reinforcer; longer delays reduce effectiveness.
- Hyperbolic Decay Function (conceptual):
- The value of a reinforcer declines with delay according to a hyperbolic rule, captured by the formula above.
Reinforcer Magnitude and Characteristics
- Magnitude (size) generally increases reinforcing effectiveness, but the relationship is not linear.
- Increases in magnitude yield diminishing returns: bigger reinforcers do not always scale proportionally in effectiveness.
- Unconditional reinforcers often show diminishing effects with greater magnitude due to satiation and other factors.
- Reinforcer Magnitude examples: not a simple linear function; need to consider context and organism state.
Reinforcer Characteristics, Task Demands, and Motivating Operations
- Specific reinforcer used matters (e.g., chocolate vs sunflower seeds for a given animal).
- Task characteristics matter (e.g., what behaviour is being reinforced—pecking for food vs a different target for a different species).
- Motivating Operations (MOs): influence the effectiveness of reinforcers
- Establishing Operations: increase the effectiveness of a reinforcer (e.g., deprivation increases value).
- Abolishing Operations: decrease the effectiveness of a reinforcer (e.g., satiation reduces value).
- Other considerations: competing contingencies of reinforcement (e.g., choosing between alternatives like study vs watching YouTube).
Premack Principle
- Core idea: the more probable (high-probability) behaviour can reinforce the less probable (low-probability) behaviour.
- Formal statement (Premack, 1965): Of any two responses, the more probable one will reinforce the less probable one.
- Example: If a child prefers playing pinball to eating candy, allowing pinball access after each candy consumption reinforces candy eating.
- Problems/Limitations:
- Does not fully account for conditional reinforcement effects.
- Low-probability behaviour can reinforce high-probability behaviour when deprivation is present (e.g., after deprivation of the low-probability behaviour).
Schedules of Reinforcement
- Schedule of Reinforcement: a rule describing how reinforcement is delivered.
- Schedule effects: different schedules produce unique patterns and rates of behaviour over time; long-term effects are predictable and observed across species.
Cumulative Records and Data Representation
- Cumulative record: a plot of cumulative responses (y-axis) over time (x-axis).
- Slope of the cumulative line indicates the rate of responding; steeper slope corresponds to higher response rates.
- Comparing frequency vs cumulative frequency:
- Frequency: number of responses in a given interval.
- Cumulative frequency: total number of responses up to each time point; useful for visualizing rate changes over time.
Types of Schedules (CRF and Intermittent)
- Continuous Reinforcement (CRF): reinforcement is delivered after every instance of the target behaviour.
- Pros: rapid acquisition and clear learning signal.
- Cons: rare in natural environments; leads to faster extinction if reinforcement stops.
- Intermittent (Partial) Reinforcement: reinforcement is delivered on some occasions only.
- Four main types:
- Fixed-Ratio (FR): reinforcement after a fixed number of responses (e.g., FR-120).
- Generates a Post-Reinforcement Pause (PRP).
- As ratio increases, PRP tends to increase; following PRP, run rates become steady.
- Variable-Ratio (VR): reinforcement after a varying number of responses around an average (e.g., VR-360).
- Ratios can be a list such as 1, 10, 20, 30, 60, 100, 180, 240, 300, 360, 420, 480, 540, 600, 660, 690, 690, 720, 739.
- Mean ratio ≈ 360.
- Characteristics: high and constant response rates; strong resistance to extinction; common in natural environments and gambling games.
- Fixed-Interval (FI): reinforcement after a fixed amount of time has passed since the last reinforcement.
- Produces a scalloped response pattern; responding increases gradually as the interval ends.
- Not common in natural environments.
- Variable-Interval (VI): intervals vary around an average (e.g., VI-3 min).
- PRPs are rare and short; produces steady, moderate response rates.
- Not as high as VR; common in natural environments.
- Other schedules: Duration schedules (Fixed/Variable Duration) reinforce contingent on continuous performance for a period (e.g., practicing guitar for 30 minutes); often used in practice settings but may not provide a reinforcer in every case.
Observing Schedules: Bed Making Example
- Bed Making data show frequency and slope changes across days under different conditions.
- Slopes (e.g., 0.35 vs 0.75) indicate different rates of responding under different reinforcement contexts.
Operant Extinction
- Extinction: withholding reinforcers that maintain a behaviour.
- Data example (cumulative responses) show a period of no responding followed by responses and occasional resets.
- Spontaneous Recovery: extinguished behaviour can reappear in similar situations after a period without reinforcement; multiple extinction sessions across settings reduce this effect.
- Reinstatement: recovery of an extinguished behaviour when the reinforcer is presented alone.
- Extinction Burst: a temporary increase in the reinforced behaviour when reinforcement is withdrawn.
- Extinction and Variability: increase in operant variability during extinction; organism may contact other sources of reinforcement.
- Resistance to Extinction (PRE): partial reinforcement effect
- Schedules that reinforce more intermittently tend to take longer to extinguish than those that reinforce less intermittently.
- E.g., FR-100 vs. continuous reinforcement (CRF): with extinction, more responses must occur under FR-100 before extinction is contacted than under CRF.
Final Note on Extinction
- Extinction is new learning, not forgetting.
- Spontaneous recovery and reinstatement show that original learning persists in some form.
- The core idea: learning is a change in behaviour due to experience; when reinforcement stops, the probability of the response decreases, reflecting new learning rather than simply forgetting.
Connections to Foundational Principles and Real-World Relevance
- Reinforcement principles underpin many educational, clinical, and organizational practices:
- Shaping behaviours through successive approximations using CRF and gradually changing schedules.
- Using Premack and motivating operations to increase adherence to desired behaviours.
- Understanding and leveraging contingencies and contiguity to design effective behavioural interventions.
- Recognizing extinction dynamics in behavior change programs and how to minimize relapse (extinction bursts, spontaneous recovery).
- Ethical and practical implications:
- Use of reinforcement should respect welfare and autonomy; avoid coercive or harmful reinforcement strategies.
- Consider individual differences in primary vs secondary reinforcement and motivational states (deprivation, satiation).
- Be mindful of unintended reinforcement of undesired behaviours when designing schedules (e.g., reinforcing avoidance or avoidance learning).