Reinforcement and Operant Conditioning Notes

Operant Learning and Reinforcement

  • Operant: a class of behaviour that operates on the environment to produce a common environmental consequence.
  • Learning: a change in behaviour due to experience.
  • Operant Learning: a change in a class of behaviour as a function of the consequences that followed it.
  • Learning = Conditioning.

Effect of Consequences

  • Reinforcement increases behaviour when the consequence follows the behaviour.
  • Punishment decreases behaviour when the consequence follows the behaviour.
  • Descriptions:
    • Increase: Reinforce
    • Decrease: Punish

Effects of Reinforcing Consequences

  • Reinforcement effects on behaviour include:
    • Increase in frequency
    • Increase in duration
    • Increase in intensity
    • Increase in quickness (i.e., decrease in latency)
    • Increase in variability
  • Reinforcement leads to an increase in whatever the reinforcer is contingent on.

Two Ways of Reinforcing (Positive vs Negative)

  • Positive Reinforcement: add a stimulus to increase a behaviour.
  • Negative Reinforcement: remove a stimulus to increase a behaviour.
  • These are the two fundamental routes to reinforcement.

Rewards vs. Reinforcers

  • Distinction:
    • Reinforcer: a consequence that increases the probability of a behaviour.
    • Reward: outcomes that may or may not function as reinforcers; not a technical term in the reinforcement framework by itself.

Reinforcers Maintain Behaviour

  • Graphical idea: reinforcement can maintain responding over time; increases in response probability can rise toward an asymptote depending on the reinforcer and conditions.
  • Key concept: conditioning trials shape performance toward a learned asymptote based on reinforcement history.

Notes about Reinforcement (Functional Description)

  • Reinforcement is a functional description, not a theory.
  • It is not circular: we say the consequence functioned as a reinforcer for the response, not that the consequence simply increased the probability because it was reinforcing.
  • Correct Usage examples:
    • “The consequence (e.g., food) functioned as a reinforcer for the response (e.g., lever pressing).”
    • “The consequence (e.g., food) reinforced the response (e.g., lever pressing).”
  • Incorrect usage would imply the consequence inherently increased probability without specifying its function as a reinforcer.

Types of Reinforcers

  • Two main types: 1) Unconditional (Primary) Reinforcer:
    • Properties derived from species’ evolutionary history (phylogenetic importance).
    • Examples: food, sex, water, sleep, social interaction, certain sensory stimulation, escape from harm (e.g., extreme heat).
    • Usually depends on some deprivation; often species-specific.
      2) Conditional (Secondary) Reinforcer:
    • Neutral stimuli/events that acquire reinforcing power via a contingent relationship with primary reinforcers or other conditioned reinforcers.
  • Key idea: secondary reinforcers gain value because they signal or enable access to primary rewards.

Empirical Examples of Reinforcement Concepts

  • Liberman et al. (1973): Differential reinforcement of incompatible behaviours
    • Incompatible behaviours: e.g., “rational talk” vs “irrational talk.”
    • Reinforced rational talk; extinguished irrational talk when it ceased receiving reinforcement.
  • Sullivan & Leon (1987): conditioned reinforcement in a perceptual/olfactory paradigm
    • Pups trained with peppermint odor conditioning showed differences in time spent near peppermint and in olfactory bulb activity (2-DG uptake) depending on conditioning groups.
    • Illustrates how neutral odors can acquire reinforcing properties through conditioning.

Delay Reduction Theory (Conditional Reinforcement)

  • Translation: The effectiveness of a stimulus as a conditional reinforcer is determined by the degree to which it is correlated with a reduction in the delay to terminal reinforcement.
  • Note: You do not need to memorize every detail on every slide, but understand the core idea that conditional reinforcers gain strength by predicting shorter delays to obtaining the primary reinforcement.
  • Formula (hyperbolic decay form shown in slides): V = rac{A}{1 + K D}
    • where:
    • VV = value of the reinforcer (reinforcing effectiveness)
    • AA = maximum reinforcing value (asymptote)
    • KK = rate of discounting (how quickly value declines with delay)
    • DD = delay to terminal reinforcement

Conditional Reinforcement in a Choice Task

  • Example (pigeons):
    • Key A provided food 20% of the time plus a conditioned reinforcer (CR).
    • Key B provided food 50% of the time but no CR.
    • Result: pigeons preferred Key A despite less actual food, due to the presence of the conditioned reinforcer associated with Key A.
  • Interpretation: The conditioned reinforcer can drive choice when its predictive value for the primary reinforcement is high enough, illustrating the power of conditional reinforcement.

Variables Affecting Reinforcement

  • Contingency: degree of correlation between a behaviour and its consequence.
  • Contiguity: nearness in time (temporal contiguity) or space (spatial contiguity) between the operant response and the reinforcer.
    • High contiguity strengthens the effectiveness of the reinforcer; longer delays reduce effectiveness.
  • Hyperbolic Decay Function (conceptual):
    • The value of a reinforcer declines with delay according to a hyperbolic rule, captured by the formula above.

Reinforcer Magnitude and Characteristics

  • Magnitude (size) generally increases reinforcing effectiveness, but the relationship is not linear.
  • Increases in magnitude yield diminishing returns: bigger reinforcers do not always scale proportionally in effectiveness.
  • Unconditional reinforcers often show diminishing effects with greater magnitude due to satiation and other factors.
  • Reinforcer Magnitude examples: not a simple linear function; need to consider context and organism state.

Reinforcer Characteristics, Task Demands, and Motivating Operations

  • Specific reinforcer used matters (e.g., chocolate vs sunflower seeds for a given animal).
  • Task characteristics matter (e.g., what behaviour is being reinforced—pecking for food vs a different target for a different species).
  • Motivating Operations (MOs): influence the effectiveness of reinforcers
    • Establishing Operations: increase the effectiveness of a reinforcer (e.g., deprivation increases value).
    • Abolishing Operations: decrease the effectiveness of a reinforcer (e.g., satiation reduces value).
  • Other considerations: competing contingencies of reinforcement (e.g., choosing between alternatives like study vs watching YouTube).

Premack Principle

  • Core idea: the more probable (high-probability) behaviour can reinforce the less probable (low-probability) behaviour.
  • Formal statement (Premack, 1965): Of any two responses, the more probable one will reinforce the less probable one.
  • Example: If a child prefers playing pinball to eating candy, allowing pinball access after each candy consumption reinforces candy eating.
  • Problems/Limitations:
    • Does not fully account for conditional reinforcement effects.
    • Low-probability behaviour can reinforce high-probability behaviour when deprivation is present (e.g., after deprivation of the low-probability behaviour).

Schedules of Reinforcement

  • Schedule of Reinforcement: a rule describing how reinforcement is delivered.
  • Schedule effects: different schedules produce unique patterns and rates of behaviour over time; long-term effects are predictable and observed across species.

Cumulative Records and Data Representation

  • Cumulative record: a plot of cumulative responses (y-axis) over time (x-axis).
  • Slope of the cumulative line indicates the rate of responding; steeper slope corresponds to higher response rates.
  • Comparing frequency vs cumulative frequency:
    • Frequency: number of responses in a given interval.
    • Cumulative frequency: total number of responses up to each time point; useful for visualizing rate changes over time.

Types of Schedules (CRF and Intermittent)

  • Continuous Reinforcement (CRF): reinforcement is delivered after every instance of the target behaviour.
    • Pros: rapid acquisition and clear learning signal.
    • Cons: rare in natural environments; leads to faster extinction if reinforcement stops.
  • Intermittent (Partial) Reinforcement: reinforcement is delivered on some occasions only.
    • Four main types:
    • Fixed-Ratio (FR): reinforcement after a fixed number of responses (e.g., FR-120).
      • Generates a Post-Reinforcement Pause (PRP).
      • As ratio increases, PRP tends to increase; following PRP, run rates become steady.
    • Variable-Ratio (VR): reinforcement after a varying number of responses around an average (e.g., VR-360).
      • Ratios can be a list such as 1, 10, 20, 30, 60, 100, 180, 240, 300, 360, 420, 480, 540, 600, 660, 690, 690, 720, 739.
      • Mean ratio ≈ 360.
      • Characteristics: high and constant response rates; strong resistance to extinction; common in natural environments and gambling games.
    • Fixed-Interval (FI): reinforcement after a fixed amount of time has passed since the last reinforcement.
      • Produces a scalloped response pattern; responding increases gradually as the interval ends.
      • Not common in natural environments.
    • Variable-Interval (VI): intervals vary around an average (e.g., VI-3 min).
      • PRPs are rare and short; produces steady, moderate response rates.
      • Not as high as VR; common in natural environments.
    • Other schedules: Duration schedules (Fixed/Variable Duration) reinforce contingent on continuous performance for a period (e.g., practicing guitar for 30 minutes); often used in practice settings but may not provide a reinforcer in every case.

Observing Schedules: Bed Making Example

  • Bed Making data show frequency and slope changes across days under different conditions.
  • Slopes (e.g., 0.35 vs 0.75) indicate different rates of responding under different reinforcement contexts.

Operant Extinction

  • Extinction: withholding reinforcers that maintain a behaviour.
  • Data example (cumulative responses) show a period of no responding followed by responses and occasional resets.
  • Spontaneous Recovery: extinguished behaviour can reappear in similar situations after a period without reinforcement; multiple extinction sessions across settings reduce this effect.
  • Reinstatement: recovery of an extinguished behaviour when the reinforcer is presented alone.
  • Extinction Burst: a temporary increase in the reinforced behaviour when reinforcement is withdrawn.
  • Extinction and Variability: increase in operant variability during extinction; organism may contact other sources of reinforcement.
  • Resistance to Extinction (PRE): partial reinforcement effect
    • Schedules that reinforce more intermittently tend to take longer to extinguish than those that reinforce less intermittently.
    • E.g., FR-100 vs. continuous reinforcement (CRF): with extinction, more responses must occur under FR-100 before extinction is contacted than under CRF.

Final Note on Extinction

  • Extinction is new learning, not forgetting.
  • Spontaneous recovery and reinstatement show that original learning persists in some form.
  • The core idea: learning is a change in behaviour due to experience; when reinforcement stops, the probability of the response decreases, reflecting new learning rather than simply forgetting.

Connections to Foundational Principles and Real-World Relevance

  • Reinforcement principles underpin many educational, clinical, and organizational practices:
    • Shaping behaviours through successive approximations using CRF and gradually changing schedules.
    • Using Premack and motivating operations to increase adherence to desired behaviours.
    • Understanding and leveraging contingencies and contiguity to design effective behavioural interventions.
    • Recognizing extinction dynamics in behavior change programs and how to minimize relapse (extinction bursts, spontaneous recovery).
  • Ethical and practical implications:
    • Use of reinforcement should respect welfare and autonomy; avoid coercive or harmful reinforcement strategies.
    • Consider individual differences in primary vs secondary reinforcement and motivational states (deprivation, satiation).
    • Be mindful of unintended reinforcement of undesired behaviours when designing schedules (e.g., reinforcing avoidance or avoidance learning).