1.3. Instrumental Learning and Operant Conditioning (Thorndike & Skinner)

Thorndike and Skinner: Instrumental Learning and Operant Conditioning

Edward Thorndike (1874–1949)
- Key figure in American behaviorism; expanded on classical conditioning by studying more complex behaviors.
- Core claim: animals learn predominantly by trial and error, not by solving problems through conscious reasoning.
- Influenced by Darwin’s ideas about smart animal behavior; sought to study learning carefully with experiments.
- Central idea: learning from past successes or failures guides future behavior.
Instrumental Learning and Operant Conditioning (Overview)
- Learning based on trial and error: trial-and-error learning leads to an association between actions and outcomes.
- The concept contrasts with insight-based problem solving in humans.

The Law of Effect (Thorndike)

Observation from hungry cat experiments in a cage with food outside and a latch to open the door.
- Initially, random actions attempted (e.g., reaching through bars).
- By accident, the cat hit the latch and obtained food; over repeated trials, it began to go straight for the latch.
- Conclusion: reward (food) strengthens the behavior that led to it; incorrect actions are weakened.
Law of Effect: Behaviors followed by satisfying or pleasant outcomes are likely to be repeated; those followed by unpleasant outcomes are less likely to be repeated.
- In simple terms: rewards reinforce actions; punishments or bad outcomes discourage them.
Relation to Skinner: Skinner extended the idea, emphasizing prediction and control of behavior via environmental consequences.
- Basic takeaway: If a behavior is rewarded, it’s more likely to recur; if punished, less likely to recur.

The Skinner Box and Operant Conditioning

Skinner conducted experiments with hungry rats in a Skinner box.
- Lever press could yield food (positive reinforcement) or avoid/terminate a shock (negative reinforcement).
- A light could warn about an impending shock, encouraging the rat to adjust behavior to avoid the shock.
Positive reinforcement: reward following a behavior increases its future probability.
- Example: Lever press → food reward → lever pressing increases.
Negative reinforcement: removal of an unpleasant stimulus increases the probability of the behavior.
- Example: Lever press stops a shock; cue (light) predicts shock, leading to lever pressing to avoid the shock.
Key takeaway: Reinforcers and punishers shape behavior; the presence or absence of stimuli following behavior alters future likelihood.

Primary and Secondary Reinforcement and Punishment

Primary (unconditioned) reinforcers/punishers are natural and do not require learning to be valued.
- Primary reinforcers: food, water, physical comfort.
- Primary punishers: pain, extreme cold, discomfort.
Secondary (conditioned) reinforcers/punishers are learned values.
- Secondary reinforcers: money, praise (learned to be valuable).
- Secondary punishers: fines, bad grades (learned negative value).
Both types can be used to change behavior: reinforce desired actions or punish unwanted ones, depending on outcomes.
Simple definitions:
- Reinforcement = increases behavior
- Punishment = decreases behavior
Example dog scenario:
- Begs at dinner; given food → begging is reinforced.
- Begs and is told "no" → begging decreases.

Positive vs Negative Reinforcement and Punishment (Practical Checklist)

Two main types of reinforcement: positive and negative (words refer to adding or removing something, not good/bad value).
- Positive reinforcement: add something pleasant after a behavior to increase its probability.
- Example: Studying hard yields a high grade; the grade reinforces studying next time.
- Formula-like idea:
  $ext{Positive Reinforcement} <br /> ightarrow ext{increase in behavior (B)} ext{ because } +R ext{ follows } B.$
- Negative reinforcement: remove something unpleasant after a behavior to increase its probability.
- Example: Studying ends nagging; removal of nagging reinforces studying.
- Punishment decreases behavior:
- Positive punishment: add something unpleasant to reduce a behavior.
  - Example: Teasing after studying hard may reduce studying.
  - $ext{Positive Punishment} <br /> ightarrow ext{decrease in } B ext{ because } +P ext{ follows } B.$
- Negative punishment: remove something pleasant to reduce a behavior.
  - Example: Missing out on hanging with friends reduces studying.
  - $ext{Negative Punishment} <br /> ightarrow ext{decrease in } B ext{ because } -P ext{ follows } B.$
Simple checklist:
- Step 1: What is the goal? Reinforcement to increase, Punishment to decrease.
- Step 2: Is it positive or negative? Positive = add; Negative = take away.
- Summary mapping:
- Positive reinforcement = add something good → behavior increases.
- Negative reinforcement = take away something bad → behavior increases.
- Positive punishment = add something bad → behavior decreases.
- Negative punishment = take away something good → behavior decreases.

Schedules of Reinforcement

Exploration of how rewards are delivered affects learning and persistence.
Continuous Reinforcement (CRF): reward every time the behavior occurs.
- Response rate: low to moderate (since reward is guaranteed, behavior may develop more slowly).
- Extinction rate: high (when rewards stop, behavior stops quickly).
Fixed-Ratio (FR-n): reward after a set number of actions (n).
- Example: food after every 5 lever presses.
- Response rate: high (more pressing leads to faster rewards).
- Extinction rate: moderate (some persistence after rewards stop).
Fixed-Interval (FI-t): reward after a fixed time interval, provided the behavior occurs.
- Example: reward every 15 minutes if lever pressed.
- Response rate: moderate and often inconsistent; animals may wait as timing approaches.
- Extinction rate: moderate.
Variable-Ratio (VR-n): reward after a random number of actions, with an average of n.
- Response rate: very high (uncertainty keeps pressing).
- Extinction rate: very low (difficult to detect when rewards stop).
- Note: Highly resistant to extinction; often linked to gambling behavior.
Variable-Interval (VI-t): reward after a random amount of time, only if the behavior occurs.
- Response rate: moderate to high and steady.
- Extinction rate: low (reward timing is unpredictable).
Key findings (Ferster & Skinner, 1957; Skinner, 1965; cited in sources):
- Variable-ratio reinforcement tends to produce the highest persistence and slowest extinction.
- Continuous reinforcement leads to the fastest extinction when rewards stop.
Summary relationship: higher baseline response rates with ratio schedules; slower extinction with variable schedules.
Practical implication: the schedule type affects how durable a learned behavior is and how much effort is invested before extinction occurs.

Response Rate and Extinction Rate (Definitions and Examples)

Response rate: how often a behavior is repeated (per unit time).
- Example: If a rat presses a lever many times in a short period, it has a high response rate.
Extinction rate: how quickly the behavior stops when reward ceases.
- Example: If a rat stops lever pressing soon after rewards stop, extinction rate is high.

Behavior Modification and Shaping

Behavior modification is a practical application of operant conditioning to reduce undesired behaviors and promote desirable ones.
- Concept: alter the surrounding rewards/consequences to gradually change behavior.
- Positive strategies: rewards, praise, privileges.
- Negative strategies: removing rewards or applying mild punishments.
Shaping: reinforce successive approximations toward a final target behavior.
- Dog training example: reward lying down, then reward turning slightly, then rolling over.
- Only steps in the desired direction are reinforced; other actions are ignored.
Skinner’s view: learning is driven largely by rewards and observable behaviors, with less emphasis on internal mental states.

Cognitive Contributions and Latent Learning

Later research showed thinking and cognitive processes also contribute to learning.
Latent learning (Seligman, 1972): learning can occur without obvious rewards and only becomes evident when a reason to demonstrate it arises.
- Classic example: Rats exploring a maze without rewards form a cognitive map; when cheese is later placed at the end, they find it quickly because they already learned the layout.
Implication: cognitive processes can influence learning even in operant conditioning paradigms.

Real-World Ethics and Implications

Punishment and child-rearing:
- German law supports raising children without violence (physical or emotional).
- Surveys show misuse persists:
- 40% admitted to some physical punishment in the past year,
- 10% slapped, and 4% used spanking.
- Psychologically, physical punishment can be harmful and may only temporarily suppress bad behavior without long-term change.
- Potential negative outcomes: learned aggression, fear of punishing parent, anxiety, depression, and feelings of helplessness.
Learned helplessness (Martin Seligman, 1972):
- Repeated exposure to uncontrollable aversive events can lead to passivity and resignation.
- Effects include hopelessness, reduced confidence, and diminished recognition of success when it occurs.
- Relevance to education and therapy: emphasize rewarding desirable behaviors to avoid learned helplessness and focus on constructive reinforcement.

Notable References and Historical Context

Ferster and Skinner (1957): different reinforcement patterns affect motivation and duration of behavior.
Nolen-Hoeksema et al. (2009); Skinner (1965): referenced in discussions of extinction and reinforcement schedules.
Seligman (1972): latent learning and learned helplessness.
Skinner’s broader claim: learning largely from reward-based contingencies; mind and internal thoughts are less central to learning than behavior and outcomes.

Quick Reference: Terminology Mapping

Reinforcement = increases behavior
Punishment = decreases behavior
Positive = adding something
Negative = taking away something
Positive reinforcement = add good → behavior increases
Negative reinforcement = remove bad → behavior increases
Positive punishment = add bad → behavior decreases
Negative punishment = remove good → behavior decreases
FR-n = reward after every n responses
FI-t = reward after fixed time t if behavior occurs
VR-n = reward after a random number of responses with average n
VI-t = reward after a random time interval t
CRF = continuous reinforcement (reward every time)
Extinction = disappearance of a previously learned behavior when rewards stop

Quick Numerical Notes

German punishment statistics (contextual example):
- 40% admitted to physical punishment in the past year
- 10% had slapped their child
- 4% used spanking
Latent learning experiment takeaway: learning can occur without immediate reward; becomes evident when incentives arise
Example of a simple reinforcement pattern:
- If a dog begs and is rewarded with food every time (CRF), the dog’s begging behavior may extinguish quickly if rewards stop (high ER under CRF).
Response vs Extinction rates under common schedules:
- VR-n: RR high, ER very low
- FR-n: RR high, ER moderate
- FI-t: RR moderate/inconsistent, ER moderate
- VI-t: RR moderate to high, ER low

References to Concepts in Practice

Shaping is widely used in dog training, education, and behavioral therapy to break complex tasks into achievable steps.
Latent learning suggests that exposing individuals to environments without immediate rewards can still build cognitive maps, useful for design of educational spaces and navigation training.
Ethical considerations in applying operant conditioning emphasize minimizing harm and avoiding punitive approaches that can cause long-term psychological harm.