1.3. Instrumental Learning and Operant Conditioning (Thorndike & Skinner)
Thorndike and Skinner: Instrumental Learning and Operant Conditioning
Edward Thorndike (1874–1949)
- Key figure in American behaviorism; expanded on classical conditioning by studying more complex behaviors.
- Core claim: animals learn predominantly by trial and error, not by solving problems through conscious reasoning.
- Influenced by Darwin’s ideas about smart animal behavior; sought to study learning carefully with experiments.
- Central idea: learning from past successes or failures guides future behavior.
Instrumental Learning and Operant Conditioning (Overview)
- Learning based on trial and error: trial-and-error learning leads to an association between actions and outcomes.
- The concept contrasts with insight-based problem solving in humans.
The Law of Effect (Thorndike)
- Observation from hungry cat experiments in a cage with food outside and a latch to open the door.
- Initially, random actions attempted (e.g., reaching through bars).
- By accident, the cat hit the latch and obtained food; over repeated trials, it began to go straight for the latch.
- Conclusion: reward (food) strengthens the behavior that led to it; incorrect actions are weakened.
- Law of Effect: Behaviors followed by satisfying or pleasant outcomes are likely to be repeated; those followed by unpleasant outcomes are less likely to be repeated.
- In simple terms: rewards reinforce actions; punishments or bad outcomes discourage them.
- Relation to Skinner: Skinner extended the idea, emphasizing prediction and control of behavior via environmental consequences.
- Basic takeaway: If a behavior is rewarded, it’s more likely to recur; if punished, less likely to recur.
The Skinner Box and Operant Conditioning
- Skinner conducted experiments with hungry rats in a Skinner box.
- Lever press could yield food (positive reinforcement) or avoid/terminate a shock (negative reinforcement).
- A light could warn about an impending shock, encouraging the rat to adjust behavior to avoid the shock.
- Positive reinforcement: reward following a behavior increases its future probability.
- Example: Lever press → food reward → lever pressing increases.
- Negative reinforcement: removal of an unpleasant stimulus increases the probability of the behavior.
- Example: Lever press stops a shock; cue (light) predicts shock, leading to lever pressing to avoid the shock.
- Key takeaway: Reinforcers and punishers shape behavior; the presence or absence of stimuli following behavior alters future likelihood.
Primary and Secondary Reinforcement and Punishment
- Primary (unconditioned) reinforcers/punishers are natural and do not require learning to be valued.
- Primary reinforcers: food, water, physical comfort.
- Primary punishers: pain, extreme cold, discomfort.
- Secondary (conditioned) reinforcers/punishers are learned values.
- Secondary reinforcers: money, praise (learned to be valuable).
- Secondary punishers: fines, bad grades (learned negative value).
- Both types can be used to change behavior: reinforce desired actions or punish unwanted ones, depending on outcomes.
- Simple definitions:
- Reinforcement = increases behavior
- Punishment = decreases behavior
- Example dog scenario:
- Begs at dinner; given food → begging is reinforced.
- Begs and is told "no" → begging decreases.
Positive vs Negative Reinforcement and Punishment (Practical Checklist)
Two main types of reinforcement: positive and negative (words refer to adding or removing something, not good/bad value).
Positive reinforcement: add something pleasant after a behavior to increase its probability.
Example: Studying hard yields a high grade; the grade reinforces studying next time.
Formula-like idea:
ext{Positive Reinforcement}
ightarrow ext{increase in behavior (B)} ext{ because } +R ext{ follows } B.Negative reinforcement: remove something unpleasant after a behavior to increase its probability.
Example: Studying ends nagging; removal of nagging reinforces studying.
Punishment decreases behavior:
Positive punishment: add something unpleasant to reduce a behavior.
- Example: Teasing after studying hard may reduce studying.
- ext{Positive Punishment}
ightarrow ext{decrease in } B ext{ because } +P ext{ follows } B.
Negative punishment: remove something pleasant to reduce a behavior.
- Example: Missing out on hanging with friends reduces studying.
- ext{Negative Punishment}
ightarrow ext{decrease in } B ext{ because } -P ext{ follows } B.
Simple checklist:
- Step 1: What is the goal? Reinforcement to increase, Punishment to decrease.
- Step 2: Is it positive or negative? Positive = add; Negative = take away.
- Summary mapping:
- Positive reinforcement = add something good → behavior increases.
- Negative reinforcement = take away something bad → behavior increases.
- Positive punishment = add something bad → behavior decreases.
- Negative punishment = take away something good → behavior decreases.
Schedules of Reinforcement
- Exploration of how rewards are delivered affects learning and persistence.
- Continuous Reinforcement (CRF): reward every time the behavior occurs.
- Response rate: low to moderate (since reward is guaranteed, behavior may develop more slowly).
- Extinction rate: high (when rewards stop, behavior stops quickly).
- Fixed-Ratio (FR-n): reward after a set number of actions (n).
- Example: food after every 5 lever presses.
- Response rate: high (more pressing leads to faster rewards).
- Extinction rate: moderate (some persistence after rewards stop).
- Fixed-Interval (FI-t): reward after a fixed time interval, provided the behavior occurs.
- Example: reward every 15 minutes if lever pressed.
- Response rate: moderate and often inconsistent; animals may wait as timing approaches.
- Extinction rate: moderate.
- Variable-Ratio (VR-n): reward after a random number of actions, with an average of n.
- Response rate: very high (uncertainty keeps pressing).
- Extinction rate: very low (difficult to detect when rewards stop).
- Note: Highly resistant to extinction; often linked to gambling behavior.
- Variable-Interval (VI-t): reward after a random amount of time, only if the behavior occurs.
- Response rate: moderate to high and steady.
- Extinction rate: low (reward timing is unpredictable).
- Key findings (Ferster & Skinner, 1957; Skinner, 1965; cited in sources):
- Variable-ratio reinforcement tends to produce the highest persistence and slowest extinction.
- Continuous reinforcement leads to the fastest extinction when rewards stop.
- Summary relationship: higher baseline response rates with ratio schedules; slower extinction with variable schedules.
- Practical implication: the schedule type affects how durable a learned behavior is and how much effort is invested before extinction occurs.
Response Rate and Extinction Rate (Definitions and Examples)
- Response rate: how often a behavior is repeated (per unit time).
- Example: If a rat presses a lever many times in a short period, it has a high response rate.
- Extinction rate: how quickly the behavior stops when reward ceases.
- Example: If a rat stops lever pressing soon after rewards stop, extinction rate is high.
Behavior Modification and Shaping
- Behavior modification is a practical application of operant conditioning to reduce undesired behaviors and promote desirable ones.
- Concept: alter the surrounding rewards/consequences to gradually change behavior.
- Positive strategies: rewards, praise, privileges.
- Negative strategies: removing rewards or applying mild punishments.
- Shaping: reinforce successive approximations toward a final target behavior.
- Dog training example: reward lying down, then reward turning slightly, then rolling over.
- Only steps in the desired direction are reinforced; other actions are ignored.
- Skinner’s view: learning is driven largely by rewards and observable behaviors, with less emphasis on internal mental states.
Cognitive Contributions and Latent Learning
- Later research showed thinking and cognitive processes also contribute to learning.
- Latent learning (Seligman, 1972): learning can occur without obvious rewards and only becomes evident when a reason to demonstrate it arises.
- Classic example: Rats exploring a maze without rewards form a cognitive map; when cheese is later placed at the end, they find it quickly because they already learned the layout.
- Implication: cognitive processes can influence learning even in operant conditioning paradigms.
Real-World Ethics and Implications
- Punishment and child-rearing:
- German law supports raising children without violence (physical or emotional).
- Surveys show misuse persists:
- 40% admitted to some physical punishment in the past year,
- 10% slapped, and 4% used spanking.
- Psychologically, physical punishment can be harmful and may only temporarily suppress bad behavior without long-term change.
- Potential negative outcomes: learned aggression, fear of punishing parent, anxiety, depression, and feelings of helplessness.
- Learned helplessness (Martin Seligman, 1972):
- Repeated exposure to uncontrollable aversive events can lead to passivity and resignation.
- Effects include hopelessness, reduced confidence, and diminished recognition of success when it occurs.
- Relevance to education and therapy: emphasize rewarding desirable behaviors to avoid learned helplessness and focus on constructive reinforcement.
Notable References and Historical Context
- Ferster and Skinner (1957): different reinforcement patterns affect motivation and duration of behavior.
- Nolen-Hoeksema et al. (2009); Skinner (1965): referenced in discussions of extinction and reinforcement schedules.
- Seligman (1972): latent learning and learned helplessness.
- Skinner’s broader claim: learning largely from reward-based contingencies; mind and internal thoughts are less central to learning than behavior and outcomes.
Quick Reference: Terminology Mapping
- Reinforcement = increases behavior
- Punishment = decreases behavior
- Positive = adding something
- Negative = taking away something
- Positive reinforcement = add good → behavior increases
- Negative reinforcement = remove bad → behavior increases
- Positive punishment = add bad → behavior decreases
- Negative punishment = remove good → behavior decreases
- FR-n = reward after every n responses
- FI-t = reward after fixed time t if behavior occurs
- VR-n = reward after a random number of responses with average n
- VI-t = reward after a random time interval t
- CRF = continuous reinforcement (reward every time)
- Extinction = disappearance of a previously learned behavior when rewards stop
Quick Numerical Notes
- German punishment statistics (contextual example):
- 40% admitted to physical punishment in the past year
- 10% had slapped their child
- 4% used spanking
- Latent learning experiment takeaway: learning can occur without immediate reward; becomes evident when incentives arise
- Example of a simple reinforcement pattern:
- If a dog begs and is rewarded with food every time (CRF), the dog’s begging behavior may extinguish quickly if rewards stop (high ER under CRF).
- Response vs Extinction rates under common schedules:
- VR-n: RR high, ER very low
- FR-n: RR high, ER moderate
- FI-t: RR moderate/inconsistent, ER moderate
- VI-t: RR moderate to high, ER low
References to Concepts in Practice
- Shaping is widely used in dog training, education, and behavioral therapy to break complex tasks into achievable steps.
- Latent learning suggests that exposing individuals to environments without immediate rewards can still build cognitive maps, useful for design of educational spaces and navigation training.
- Ethical considerations in applying operant conditioning emphasize minimizing harm and avoiding punitive approaches that can cause long-term psychological harm.