1.3. Instrumental Learning and Operant Conditioning (Thorndike & Skinner)

Thorndike and Skinner: Instrumental Learning and Operant Conditioning

  • Edward Thorndike (1874–1949)

    • Key figure in American behaviorism; expanded on classical conditioning by studying more complex behaviors.
    • Core claim: animals learn predominantly by trial and error, not by solving problems through conscious reasoning.
    • Influenced by Darwin’s ideas about smart animal behavior; sought to study learning carefully with experiments.
    • Central idea: learning from past successes or failures guides future behavior.
  • Instrumental Learning and Operant Conditioning (Overview)

    • Learning based on trial and error: trial-and-error learning leads to an association between actions and outcomes.
    • The concept contrasts with insight-based problem solving in humans.

The Law of Effect (Thorndike)

  • Observation from hungry cat experiments in a cage with food outside and a latch to open the door.
    • Initially, random actions attempted (e.g., reaching through bars).
    • By accident, the cat hit the latch and obtained food; over repeated trials, it began to go straight for the latch.
    • Conclusion: reward (food) strengthens the behavior that led to it; incorrect actions are weakened.
  • Law of Effect: Behaviors followed by satisfying or pleasant outcomes are likely to be repeated; those followed by unpleasant outcomes are less likely to be repeated.
    • In simple terms: rewards reinforce actions; punishments or bad outcomes discourage them.
  • Relation to Skinner: Skinner extended the idea, emphasizing prediction and control of behavior via environmental consequences.
    • Basic takeaway: If a behavior is rewarded, it’s more likely to recur; if punished, less likely to recur.

The Skinner Box and Operant Conditioning

  • Skinner conducted experiments with hungry rats in a Skinner box.
    • Lever press could yield food (positive reinforcement) or avoid/terminate a shock (negative reinforcement).
    • A light could warn about an impending shock, encouraging the rat to adjust behavior to avoid the shock.
  • Positive reinforcement: reward following a behavior increases its future probability.
    • Example: Lever press → food reward → lever pressing increases.
  • Negative reinforcement: removal of an unpleasant stimulus increases the probability of the behavior.
    • Example: Lever press stops a shock; cue (light) predicts shock, leading to lever pressing to avoid the shock.
  • Key takeaway: Reinforcers and punishers shape behavior; the presence or absence of stimuli following behavior alters future likelihood.

Primary and Secondary Reinforcement and Punishment

  • Primary (unconditioned) reinforcers/punishers are natural and do not require learning to be valued.
    • Primary reinforcers: food, water, physical comfort.
    • Primary punishers: pain, extreme cold, discomfort.
  • Secondary (conditioned) reinforcers/punishers are learned values.
    • Secondary reinforcers: money, praise (learned to be valuable).
    • Secondary punishers: fines, bad grades (learned negative value).
  • Both types can be used to change behavior: reinforce desired actions or punish unwanted ones, depending on outcomes.
  • Simple definitions:
    • Reinforcement = increases behavior
    • Punishment = decreases behavior
  • Example dog scenario:
    • Begs at dinner; given food → begging is reinforced.
    • Begs and is told "no" → begging decreases.

Positive vs Negative Reinforcement and Punishment (Practical Checklist)

  • Two main types of reinforcement: positive and negative (words refer to adding or removing something, not good/bad value).

    • Positive reinforcement: add something pleasant after a behavior to increase its probability.

    • Example: Studying hard yields a high grade; the grade reinforces studying next time.

    • Formula-like idea:
      ext{Positive Reinforcement}
      ightarrow ext{increase in behavior (B)} ext{ because } +R ext{ follows } B.

    • Negative reinforcement: remove something unpleasant after a behavior to increase its probability.

    • Example: Studying ends nagging; removal of nagging reinforces studying.

    • Punishment decreases behavior:

    • Positive punishment: add something unpleasant to reduce a behavior.

      • Example: Teasing after studying hard may reduce studying.
      • ext{Positive Punishment}
        ightarrow ext{decrease in } B ext{ because } +P ext{ follows } B.
    • Negative punishment: remove something pleasant to reduce a behavior.

      • Example: Missing out on hanging with friends reduces studying.
      • ext{Negative Punishment}
        ightarrow ext{decrease in } B ext{ because } -P ext{ follows } B.
  • Simple checklist:

    • Step 1: What is the goal? Reinforcement to increase, Punishment to decrease.
    • Step 2: Is it positive or negative? Positive = add; Negative = take away.
    • Summary mapping:
    • Positive reinforcement = add something good → behavior increases.
    • Negative reinforcement = take away something bad → behavior increases.
    • Positive punishment = add something bad → behavior decreases.
    • Negative punishment = take away something good → behavior decreases.

Schedules of Reinforcement

  • Exploration of how rewards are delivered affects learning and persistence.
  • Continuous Reinforcement (CRF): reward every time the behavior occurs.
    • Response rate: low to moderate (since reward is guaranteed, behavior may develop more slowly).
    • Extinction rate: high (when rewards stop, behavior stops quickly).
  • Fixed-Ratio (FR-n): reward after a set number of actions (n).
    • Example: food after every 5 lever presses.
    • Response rate: high (more pressing leads to faster rewards).
    • Extinction rate: moderate (some persistence after rewards stop).
  • Fixed-Interval (FI-t): reward after a fixed time interval, provided the behavior occurs.
    • Example: reward every 15 minutes if lever pressed.
    • Response rate: moderate and often inconsistent; animals may wait as timing approaches.
    • Extinction rate: moderate.
  • Variable-Ratio (VR-n): reward after a random number of actions, with an average of n.
    • Response rate: very high (uncertainty keeps pressing).
    • Extinction rate: very low (difficult to detect when rewards stop).
    • Note: Highly resistant to extinction; often linked to gambling behavior.
  • Variable-Interval (VI-t): reward after a random amount of time, only if the behavior occurs.
    • Response rate: moderate to high and steady.
    • Extinction rate: low (reward timing is unpredictable).
  • Key findings (Ferster & Skinner, 1957; Skinner, 1965; cited in sources):
    • Variable-ratio reinforcement tends to produce the highest persistence and slowest extinction.
    • Continuous reinforcement leads to the fastest extinction when rewards stop.
  • Summary relationship: higher baseline response rates with ratio schedules; slower extinction with variable schedules.
  • Practical implication: the schedule type affects how durable a learned behavior is and how much effort is invested before extinction occurs.

Response Rate and Extinction Rate (Definitions and Examples)

  • Response rate: how often a behavior is repeated (per unit time).
    • Example: If a rat presses a lever many times in a short period, it has a high response rate.
  • Extinction rate: how quickly the behavior stops when reward ceases.
    • Example: If a rat stops lever pressing soon after rewards stop, extinction rate is high.

Behavior Modification and Shaping

  • Behavior modification is a practical application of operant conditioning to reduce undesired behaviors and promote desirable ones.
    • Concept: alter the surrounding rewards/consequences to gradually change behavior.
    • Positive strategies: rewards, praise, privileges.
    • Negative strategies: removing rewards or applying mild punishments.
  • Shaping: reinforce successive approximations toward a final target behavior.
    • Dog training example: reward lying down, then reward turning slightly, then rolling over.
    • Only steps in the desired direction are reinforced; other actions are ignored.
  • Skinner’s view: learning is driven largely by rewards and observable behaviors, with less emphasis on internal mental states.

Cognitive Contributions and Latent Learning

  • Later research showed thinking and cognitive processes also contribute to learning.
  • Latent learning (Seligman, 1972): learning can occur without obvious rewards and only becomes evident when a reason to demonstrate it arises.
    • Classic example: Rats exploring a maze without rewards form a cognitive map; when cheese is later placed at the end, they find it quickly because they already learned the layout.
  • Implication: cognitive processes can influence learning even in operant conditioning paradigms.

Real-World Ethics and Implications

  • Punishment and child-rearing:
    • German law supports raising children without violence (physical or emotional).
    • Surveys show misuse persists:
    • 40% admitted to some physical punishment in the past year,
    • 10% slapped, and 4% used spanking.
    • Psychologically, physical punishment can be harmful and may only temporarily suppress bad behavior without long-term change.
    • Potential negative outcomes: learned aggression, fear of punishing parent, anxiety, depression, and feelings of helplessness.
  • Learned helplessness (Martin Seligman, 1972):
    • Repeated exposure to uncontrollable aversive events can lead to passivity and resignation.
    • Effects include hopelessness, reduced confidence, and diminished recognition of success when it occurs.
    • Relevance to education and therapy: emphasize rewarding desirable behaviors to avoid learned helplessness and focus on constructive reinforcement.

Notable References and Historical Context

  • Ferster and Skinner (1957): different reinforcement patterns affect motivation and duration of behavior.
  • Nolen-Hoeksema et al. (2009); Skinner (1965): referenced in discussions of extinction and reinforcement schedules.
  • Seligman (1972): latent learning and learned helplessness.
  • Skinner’s broader claim: learning largely from reward-based contingencies; mind and internal thoughts are less central to learning than behavior and outcomes.

Quick Reference: Terminology Mapping

  • Reinforcement = increases behavior
  • Punishment = decreases behavior
  • Positive = adding something
  • Negative = taking away something
  • Positive reinforcement = add good → behavior increases
  • Negative reinforcement = remove bad → behavior increases
  • Positive punishment = add bad → behavior decreases
  • Negative punishment = remove good → behavior decreases
  • FR-n = reward after every n responses
  • FI-t = reward after fixed time t if behavior occurs
  • VR-n = reward after a random number of responses with average n
  • VI-t = reward after a random time interval t
  • CRF = continuous reinforcement (reward every time)
  • Extinction = disappearance of a previously learned behavior when rewards stop

Quick Numerical Notes

  • German punishment statistics (contextual example):
    • 40% admitted to physical punishment in the past year
    • 10% had slapped their child
    • 4% used spanking
  • Latent learning experiment takeaway: learning can occur without immediate reward; becomes evident when incentives arise
  • Example of a simple reinforcement pattern:
    • If a dog begs and is rewarded with food every time (CRF), the dog’s begging behavior may extinguish quickly if rewards stop (high ER under CRF).
  • Response vs Extinction rates under common schedules:
    • VR-n: RR high, ER very low
    • FR-n: RR high, ER moderate
    • FI-t: RR moderate/inconsistent, ER moderate
    • VI-t: RR moderate to high, ER low

References to Concepts in Practice

  • Shaping is widely used in dog training, education, and behavioral therapy to break complex tasks into achievable steps.
  • Latent learning suggests that exposing individuals to environments without immediate rewards can still build cognitive maps, useful for design of educational spaces and navigation training.
  • Ethical considerations in applying operant conditioning emphasize minimizing harm and avoiding punitive approaches that can cause long-term psychological harm.