Module Notes: Learning, Reinforcement, and Observational Learning

Pigeon-Guided Missile Prototype

Historical context: early attempts to improve targeting in bombing missions before missiles could autonomously track targets.
Concept: use live animals to guide weapons genetically tied to feedback from the weapon’s controls.
Skinner’s pigeon-guided missile:
- Thousands of pigeons trained to peck at pictures of targets (e.g., tanks, ships).
- Front of missile had windows (three circles) for the pigeon to view the target.
- The pigeon’s pecks controlled fins on the back of the missile, steering it toward the target.
- Basic mechanism: peck = steer toward target; misalignment leads to overcorrection; repeated pecking directs missile toward target.
- Practical issues: decisions about target identity (enemy vs friendly) are difficult for the pigeon; one-way trip for the pigeon; reliability concerns.
- Outcome: prototype used to illustrate early attempts to solve targeting problems via psychology, not deployed in combat.
Significance: an early integration of behavioral psychology and technology; illustrates limits and ethical concerns of using animals for warfare.

Positive vs Negative in Psychology (Key Terminology)

Positive vs negative do not mean good vs bad in this context:
- Positive = additive; something is added to the environment or situation.
- Negative = subtractive; something is removed from the environment or situation.
Examples in psychopathology:
- Positive symptoms of schizophrenia: additions to experience (e.g., hallucinations). These are additions, not "good".
- Negative symptoms of schizophrenia: reductions in typical functions (e.g., flat affect; social/affective processing). These are losses, not "bad" in value sense but absence of typical functioning.
Consequences and reinforcement:
- A consequence can reinforce (increase behavior) or punish (decrease behavior).
- What counts as reinforcing or punishing depends on the learner’s perspective.
Practical takeaway: always consider the learner’s viewpoint when evaluating what counts as reinforcement or punishment.

Reinforcement and Punishment: Core Concepts

Reinforcement: a consequence that increases the likelihood of the behavior occurring again.
- Positive reinforcement: add a desirable stimulus to increase a behavior.
- Negative reinforcement: remove an aversive stimulus to increase a behavior.
Punishment: a consequence that decreases the likelihood of the behavior repeating.
- Positive punishment: add an aversive stimulus to decrease a behavior.
- Negative punishment: remove a desirable stimulus to decrease a behavior.
Perspectives matter: reinforcing/punishing is defined relative to the learner’s experience and perception.
Example illustrating perspective:
- A parent yells at a child for delinquent behavior; even though yelling might be intended as punishment, the child’s perception and broader context determine whether the behavior decreases.
- Similarly, chocolate chips cookies as a reinforcer vary by learner (some love them, others may dislike them).
Practical takeaway: use learner-centered thinking to determine what constitutes reinforcement or punishment.

Reinforcement: Detailed Types and Examples

Positive reinforcement: add something pleasant to increase the behavior.
- Example: In the morning, giving a reward (e.g., a small treat or extra screen time) for getting shoes, coats, and backpacks by the front door.
- Conceptual formula: when a behavior occurs, a desired outcome is added, increasing the probability of the behavior in the future.
Negative reinforcement: remove something aversive to increase the behavior.
- Example: If a child places belongings by the door, the family might remove the burden of extra chores the next day.
Primary reinforcers: meet basic biological needs without learning history.
- Examples: $ext{Food}, ext{water}, ext{sleep}, ext{sex}$
Secondary reinforcers: acquire reinforcing value through learning;
- Example: money, praise, tokens, badges.
- Function: can be exchanged for primary reinforcers or privileges.
Token economy: a structured form of secondary reinforcement.
- Steps:
  1) List target behaviors to reward.
  2) Choose a tangible token (e.g., poker chips, stickers, chips).
  3) Define how tokens can be exchanged for rewards.
- Applications: classrooms, workplaces; can be effective with kids but less so with adults.
Practical example with a classroom or household context:
- Physical tokens used as reinforcement; tokens exchanged for privileges or objects.
- Token economies rely on the power of secondary reinforcers to shape behavior over time.

Punishment: Types, Considerations, and Ethical Notes

Positive punishment: add something unpleasant to reduce a behavior.
- Example: adding chores for kicking off shoes in the living room.
Negative punishment: remove something pleasant to reduce a behavior.
- Example: taking away a preferred device or activity when disruptive behavior occurs.
Cautions:
- Punishment can induce fear; frequent or harsh punishment may damage relationship bonds.
- Visible punishment (e.g., spanking) sends messages about aggression as a solution; consider long-term impacts.
- Punishment must be used thoughtfully, particularly in parenting, to avoid unintended negative consequences.

Partial Reinforcement and Schedules of Reinforcement

Continuous reinforcement vs partial reinforcement:
- Continuous reinforcement: reinforcement after every occurrence of the behavior; stronger immediate learning but easier to extinguish.
- Partial reinforcement: reinforcement only after some instances; more resistant to extinction, but slower to acquire the behavior initially.
Two fundamental questions for partial reinforcement: 1) Ratio vs Interval: is reinforcement based on the number of responses (ratio) or on elapsed time (interval)?
- Ratio: reinforce after a number of successful responses.
- Interval: reinforce after a fixed or variable amount of time has passed.
  2) Fixed vs Variable: is the schedule predictable or unpredictable?
- Fixed: the requirement is constant (same number of responses or same time interval).
- Variable: the requirement varies around an average (unpredictable schedule).
Four common schedules:
- Fixed Ratio (FR): reinforcement after a fixed number of responses.
- Example: $FR(n)$ where reinforcement occurs after every $n$ responses.
- Variable Ratio (VR): reinforcement after an unpredictable number of responses.
- Example: gambling; you might win after a variable number of plays.
- Fixed Interval (FI): reinforcement after a fixed amount of time has passed.
- Example: paychecks every two weeks; reinforcement is time-based and predictable.
- Variable Interval (VI): reinforcement after variable time intervals.
- Example: checking social media for likes; reinforcement comes at unpredictable times.
Real-world illustrations from the transcript:
- Piecework in a factory: pay per shirt ($4 per shirt) illustrates a Fixed Ratio setup; higher response rate with a post-reinforcement pause.
- Slot machines (gambling): classic Variable Ratio example; reinforcement after an unpredictable number of plays.
- Paychecks: example of a Fixed Interval schedule (FI) with a predictable time frame.
- Quizzes or checks of car repair status: illustrate Variable Interval depending on when the mechanic finishes.
Quick table reference (conceptual):
- Fixed Ratio (FR): after fixed number of responses; high response rate with post-reinforcement pauses.
- Variable Ratio (VR): after unpredictable number of responses; high and steady response rate.
- Fixed Interval (FI): after fixed time period; scalloped response pattern around reinforcement time.
- Variable Interval (VI): after varying time periods; steady, moderate response rate.
Tip for study: organizing a one-page cheat sheet with FR, VR, FI, VI definitions and examples helps keep schedules straight during exams.

Observational Learning and Bandura’s Model

Observational learning (modeling): learning by watching others, without direct reinforcement or punishment.
Four essential elements (Bandura):
- Attention: need to pay attention to the model.
- Retention: must be able to remember what was observed.
- Motor reproduction: the observer must be physically capable of performing the observed action.
- Reinforcement or incentive conditions: the observer must believe that similar outcomes are possible for them.
Everyday example from transcript: observing a sibling’s curfew consequences teaches the learner to comply with curfew in the future.
Practical takeaway: modeling can be effective even without direct reinforcement, provided the observer attends, retains, can reproduce the behavior, and expects similar outcomes.

Cognitive, Motivational, and Individual-Difference Perspectives on Learning

Classical and operant conditioning explain a lot, but not all learning.
Tolman and purposive behavior: learning can be goal-directed and motivated by expected outcomes, not just reinforcement history.
Expectancy learning: learners anticipate outcomes and adjust behavior accordingly.
Latent or implicit learning: learning that occurs without explicit reinforcement, simply through exposure or experience (e.g., finding the pizza in the lobby after being told there would be free pizza).
Motivations and goals: learners may engage in activities for internal satisfaction or curiosity, even without external rewards.
Instinctive drift and preparedness:
- Some species have innate prewired tendencies that shape which behaviors are easily learned.
- Language in humans is a prime example of preparedness and rapid acquisition in early development.
Language and preparedness example:
- Sicilian-English-French multilingual environment in Montreal demonstrated the need to adapt language skills to social context and survival requirements; language learning is affected by cultural and environmental factors.

Connections to Real-World Relevance and Ethics

Reinforcement and punishment strategies have broad applications in parenting, education, workplaces, and therapy.
Token economies illustrate how secondary reinforcers can shape behavior across settings, particularly with children.
Observational learning underscores the power of role models and social norms in shaping behavior and policy.
Understanding cognitive components and latent learning helps explain why people engage in activities without obvious external rewards (e.g., hobbies, music, plant care).
Ethical considerations: when using punishment or aversive methods, weigh potential long-term harms, fear, and relationship impact; prefer reinforcement-based strategies where feasible.

Summary and Key Takeaways

Early attempts to solve targeting challenges used behavioral psychology, e.g., Skinner’s pigeon-guided missile prototype; highlights both ingenuity and practical limits.
Positive/negative terminology in psychology refers to addition/removal, not moral valence; context is crucial for determining reinforcement or punishment.
Reinforcement increases the likelihood of a behavior; punishment decreases it; the learner’s perspective matters.
Primary reinforcers satisfy biological needs; secondary reinforcers acquire value through learning (e.g., money, tokens).
Token economies leverage secondary reinforcers to shape behavior; effectiveness varies with age and context.
Partial reinforcement schedules (FR, VR, FI, VI) influence how and when reinforcement is delivered; variable schedules generally produce more persistent behavior than fixed ones.
Observational learning involves attention, retention, motor reproduction, and expected consequences; learning can occur without direct reinforcement.
Cognitive and motivational factors (Tolman, expectancy, latent learning) enrich our understanding of learning beyond strict behaviorism.
Preparedness and instinctive drift remind us that biology and evolution shape what and how we learn, such as language acquisition in humans.

$FR: ext{Fixed Ratio} \ VR: ext{Variable Ratio} \ FI: ext{Fixed Interval} \ VI: ext{Variable Interval}$

Example equations and values from the transcript:
- Pay-per-item example:
- If each shirt yields $oxed{4}$ dollars, then for $n$ shirts, earnings $= 4n$.
- Reinforcement relationships: a behavior followed by a consequence that increases its future probability is reinforcement; a consequence that reduces it is punishment.
- Basic reinforcer categories:
- Primary: $ext{Food}, ext{Water}, ext{Sleep}, ext{Sex}$ .
- Secondary: $ext{Money}, ext{Tokens}, ext{Praise}$ .
Practice prompt: identify whether the following is a positive/negative reinforcement or punishment for the learner (yourself):
- You study and then you get a grade boost (reinforcement, positive).
- You miss a deadline and lose access to a favorite app (punishment, negative).
- You complete chores and avoid extra chores tomorrow (reinforcement, negative).