Module Notes: Learning, Reinforcement, and Observational Learning

Pigeon-Guided Missile Prototype

  • Historical context: early attempts to improve targeting in bombing missions before missiles could autonomously track targets.

  • Concept: use live animals to guide weapons genetically tied to feedback from the weapon’s controls.

  • Skinner’s pigeon-guided missile:

    • Thousands of pigeons trained to peck at pictures of targets (e.g., tanks, ships).

    • Front of missile had windows (three circles) for the pigeon to view the target.

    • The pigeon’s pecks controlled fins on the back of the missile, steering it toward the target.

    • Basic mechanism: peck = steer toward target; misalignment leads to overcorrection; repeated pecking directs missile toward target.

    • Practical issues: decisions about target identity (enemy vs friendly) are difficult for the pigeon; one-way trip for the pigeon; reliability concerns.

    • Outcome: prototype used to illustrate early attempts to solve targeting problems via psychology, not deployed in combat.

  • Significance: an early integration of behavioral psychology and technology; illustrates limits and ethical concerns of using animals for warfare.

Positive vs Negative in Psychology (Key Terminology)

  • Positive vs negative do not mean good vs bad in this context:

    • Positive = additive; something is added to the environment or situation.

    • Negative = subtractive; something is removed from the environment or situation.

  • Examples in psychopathology:

    • Positive symptoms of schizophrenia: additions to experience (e.g., hallucinations). These are additions, not "good".

    • Negative symptoms of schizophrenia: reductions in typical functions (e.g., flat affect; social/affective processing). These are losses, not "bad" in value sense but absence of typical functioning.

  • Consequences and reinforcement:

    • A consequence can reinforce (increase behavior) or punish (decrease behavior).

    • What counts as reinforcing or punishing depends on the learner’s perspective.

  • Practical takeaway: always consider the learner’s viewpoint when evaluating what counts as reinforcement or punishment.

Reinforcement and Punishment: Core Concepts

  • Reinforcement: a consequence that increases the likelihood of the behavior occurring again.

    • Positive reinforcement: add a desirable stimulus to increase a behavior.

    • Negative reinforcement: remove an aversive stimulus to increase a behavior.

  • Punishment: a consequence that decreases the likelihood of the behavior repeating.

    • Positive punishment: add an aversive stimulus to decrease a behavior.

    • Negative punishment: remove a desirable stimulus to decrease a behavior.

  • Perspectives matter: reinforcing/punishing is defined relative to the learner’s experience and perception.

  • Example illustrating perspective:

    • A parent yells at a child for delinquent behavior; even though yelling might be intended as punishment, the child’s perception and broader context determine whether the behavior decreases.

    • Similarly, chocolate chips cookies as a reinforcer vary by learner (some love them, others may dislike them).

  • Practical takeaway: use learner-centered thinking to determine what constitutes reinforcement or punishment.

Reinforcement: Detailed Types and Examples

  • Positive reinforcement: add something pleasant to increase the behavior.

    • Example: In the morning, giving a reward (e.g., a small treat or extra screen time) for getting shoes, coats, and backpacks by the front door.

    • Conceptual formula: when a behavior occurs, a desired outcome is added, increasing the probability of the behavior in the future.

  • Negative reinforcement: remove something aversive to increase the behavior.

    • Example: If a child places belongings by the door, the family might remove the burden of extra chores the next day.

  • Primary reinforcers: meet basic biological needs without learning history.

    • Examples: extFood,extwater,extsleep,extsexext{Food}, ext{water}, ext{sleep}, ext{sex}

  • Secondary reinforcers: acquire reinforcing value through learning;

    • Example: money, praise, tokens, badges.

    • Function: can be exchanged for primary reinforcers or privileges.

  • Token economy: a structured form of secondary reinforcement.

    • Steps:
      1) List target behaviors to reward.
      2) Choose a tangible token (e.g., poker chips, stickers, chips).
      3) Define how tokens can be exchanged for rewards.

    • Applications: classrooms, workplaces; can be effective with kids but less so with adults.

  • Practical example with a classroom or household context:

    • Physical tokens used as reinforcement; tokens exchanged for privileges or objects.

    • Token economies rely on the power of secondary reinforcers to shape behavior over time.

Punishment: Types, Considerations, and Ethical Notes

  • Positive punishment: add something unpleasant to reduce a behavior.

    • Example: adding chores for kicking off shoes in the living room.

  • Negative punishment: remove something pleasant to reduce a behavior.

    • Example: taking away a preferred device or activity when disruptive behavior occurs.

  • Cautions:

    • Punishment can induce fear; frequent or harsh punishment may damage relationship bonds.

    • Visible punishment (e.g., spanking) sends messages about aggression as a solution; consider long-term impacts.

    • Punishment must be used thoughtfully, particularly in parenting, to avoid unintended negative consequences.

Partial Reinforcement and Schedules of Reinforcement

  • Continuous reinforcement vs partial reinforcement:

    • Continuous reinforcement: reinforcement after every occurrence of the behavior; stronger immediate learning but easier to extinguish.

    • Partial reinforcement: reinforcement only after some instances; more resistant to extinction, but slower to acquire the behavior initially.

  • Two fundamental questions for partial reinforcement: 1) Ratio vs Interval: is reinforcement based on the number of responses (ratio) or on elapsed time (interval)?

    • Ratio: reinforce after a number of successful responses.

    • Interval: reinforce after a fixed or variable amount of time has passed.
      2) Fixed vs Variable: is the schedule predictable or unpredictable?

    • Fixed: the requirement is constant (same number of responses or same time interval).

    • Variable: the requirement varies around an average (unpredictable schedule).

  • Four common schedules:

    • Fixed Ratio (FR): reinforcement after a fixed number of responses.

    • Example: $FR(n)$ where reinforcement occurs after every $n$ responses.

    • Variable Ratio (VR): reinforcement after an unpredictable number of responses.

    • Example: gambling; you might win after a variable number of plays.

    • Fixed Interval (FI): reinforcement after a fixed amount of time has passed.

    • Example: paychecks every two weeks; reinforcement is time-based and predictable.

    • Variable Interval (VI): reinforcement after variable time intervals.

    • Example: checking social media for likes; reinforcement comes at unpredictable times.

  • Real-world illustrations from the transcript:

    • Piecework in a factory: pay per shirt ($4 per shirt) illustrates a Fixed Ratio setup; higher response rate with a post-reinforcement pause.

    • Slot machines (gambling): classic Variable Ratio example; reinforcement after an unpredictable number of plays.

    • Paychecks: example of a Fixed Interval schedule (FI) with a predictable time frame.

    • Quizzes or checks of car repair status: illustrate Variable Interval depending on when the mechanic finishes.

  • Quick table reference (conceptual):

    • Fixed Ratio (FR): after fixed number of responses; high response rate with post-reinforcement pauses.

    • Variable Ratio (VR): after unpredictable number of responses; high and steady response rate.

    • Fixed Interval (FI): after fixed time period; scalloped response pattern around reinforcement time.

    • Variable Interval (VI): after varying time periods; steady, moderate response rate.

  • Tip for study: organizing a one-page cheat sheet with FR, VR, FI, VI definitions and examples helps keep schedules straight during exams.

Observational Learning and Bandura’s Model

  • Observational learning (modeling): learning by watching others, without direct reinforcement or punishment.

  • Four essential elements (Bandura):

    • Attention: need to pay attention to the model.

    • Retention: must be able to remember what was observed.

    • Motor reproduction: the observer must be physically capable of performing the observed action.

    • Reinforcement or incentive conditions: the observer must believe that similar outcomes are possible for them.

  • Everyday example from transcript: observing a sibling’s curfew consequences teaches the learner to comply with curfew in the future.

  • Practical takeaway: modeling can be effective even without direct reinforcement, provided the observer attends, retains, can reproduce the behavior, and expects similar outcomes.

Cognitive, Motivational, and Individual-Difference Perspectives on Learning

  • Classical and operant conditioning explain a lot, but not all learning.

  • Tolman and purposive behavior: learning can be goal-directed and motivated by expected outcomes, not just reinforcement history.

  • Expectancy learning: learners anticipate outcomes and adjust behavior accordingly.

  • Latent or implicit learning: learning that occurs without explicit reinforcement, simply through exposure or experience (e.g., finding the pizza in the lobby after being told there would be free pizza).

  • Motivations and goals: learners may engage in activities for internal satisfaction or curiosity, even without external rewards.

  • Instinctive drift and preparedness:

    • Some species have innate prewired tendencies that shape which behaviors are easily learned.

    • Language in humans is a prime example of preparedness and rapid acquisition in early development.

  • Language and preparedness example:

    • Sicilian-English-French multilingual environment in Montreal demonstrated the need to adapt language skills to social context and survival requirements; language learning is affected by cultural and environmental factors.

Connections to Real-World Relevance and Ethics

  • Reinforcement and punishment strategies have broad applications in parenting, education, workplaces, and therapy.

  • Token economies illustrate how secondary reinforcers can shape behavior across settings, particularly with children.

  • Observational learning underscores the power of role models and social norms in shaping behavior and policy.

  • Understanding cognitive components and latent learning helps explain why people engage in activities without obvious external rewards (e.g., hobbies, music, plant care).

  • Ethical considerations: when using punishment or aversive methods, weigh potential long-term harms, fear, and relationship impact; prefer reinforcement-based strategies where feasible.

Summary and Key Takeaways

  • Early attempts to solve targeting challenges used behavioral psychology, e.g., Skinner’s pigeon-guided missile prototype; highlights both ingenuity and practical limits.

  • Positive/negative terminology in psychology refers to addition/removal, not moral valence; context is crucial for determining reinforcement or punishment.

  • Reinforcement increases the likelihood of a behavior; punishment decreases it; the learner’s perspective matters.

  • Primary reinforcers satisfy biological needs; secondary reinforcers acquire value through learning (e.g., money, tokens).

  • Token economies leverage secondary reinforcers to shape behavior; effectiveness varies with age and context.

  • Partial reinforcement schedules (FR, VR, FI, VI) influence how and when reinforcement is delivered; variable schedules generally produce more persistent behavior than fixed ones.

  • Observational learning involves attention, retention, motor reproduction, and expected consequences; learning can occur without direct reinforcement.

  • Cognitive and motivational factors (Tolman, expectancy, latent learning) enrich our understanding of learning beyond strict behaviorism.

  • Preparedness and instinctive drift remind us that biology and evolution shape what and how we learn, such as language acquisition in humans.

FR:extFixedRatio VR:extVariableRatio FI:extFixedInterval VI:extVariableIntervalFR: ext{Fixed Ratio} \ VR: ext{Variable Ratio} \ FI: ext{Fixed Interval} \ VI: ext{Variable Interval}

  • Example equations and values from the transcript:

    • Pay-per-item example:

    • If each shirt yields $oxed{4}$ dollars, then for $n$ shirts, earnings $= 4n$.

    • Reinforcement relationships: a behavior followed by a consequence that increases its future probability is reinforcement; a consequence that reduces it is punishment.

    • Basic reinforcer categories:

    • Primary: extFood,extWater,extSleep,extSexext{Food}, ext{Water}, ext{Sleep}, ext{Sex}.

    • Secondary: extMoney,extTokens,extPraiseext{Money}, ext{Tokens}, ext{Praise}.

  • Practice prompt: identify whether the following is a positive/negative reinforcement or punishment for the learner (yourself):

    • You study and then you get a grade boost (reinforcement, positive).

    • You miss a deadline and lose access to a favorite app (punishment, negative).

    • You complete chores and avoid extra chores tomorrow (reinforcement, negative).