Psych Lecture 6- Operant Conditioning & Observational Learning
Overview of learning theories covered
Focus on operant conditioning (OC) and observational learning (OL) / social cognitive theory (SCT).
Emphasis on observable behavior and how consequences shape future behavior; connects to classical conditioning through foundational ideas about stimulus–response and learning, but introduces reinforcement/punishment and cognitive processes in OL/SCT.
Practical applications discussed: classroom behavior management, ABA (Applied Behavior Analysis), token economies, shaping, extinction, spontaneous recovery, generalization, discrimination, Premack principle, and desensitization for phobias.
Ethical and real-world considerations: effectiveness of positive reinforcement, criticisms of punishment, debates about mass-media exposure and aggression, and the role of cognitive factors in learning.
Key terms to master: positive/negative reinforcement, positive/negative punishment, punishment schedules (continuous vs intermittent), extinction, spontaneous recovery, discrimination, generalization, discriminative stimulus, shaping, token economy, ABA, observational learning, modeling, attention, retention, reproduction, motivation, vicarious reinforcement, desensitization.
Operant Conditioning: core concepts and terminology
Positive reinforcement (PR): adding a desirable stimulus to increase a behavior.
Example: giving cookies (desirable outcome) after a behavior (e.g., sitting at dinner) to increase compliant behavior.
Negative reinforcement (NR): removing an aversive stimulus to increase a behavior.
Example: stopping nagging once the desired behavior occurs, thus reinforcing the behavior by removing discomfort.
Positive punishment (PP): adding an aversive stimulus to decrease a behavior.
Example: scolding or spanking to discourage an unwanted behavior.
Negative punishment (NP): removing a desirable stimulus to decrease a behavior.
Example: taking away privileges (free time, screen access) to reduce an unwanted behavior.
Continuous vs. intermittent (schedule) reinforcement/punishment
Continuous punishment: applied every time the behavior occurs, fosters faster learning but high risk of extinction if stopped.
In practice: continuous punishment recommended to ensure consistency and to avoid teaching that the behavior might occasionally be tolerated.
Consistency and communication in punishment
Always explain why the behavior is wrong, what the consequences are, and explicitly state which behaviors are acceptable.
Premack principle (a.k.a. granding principle)
Statement: If a high-probability behavior H is more likely to occur than a low-probability behavior L, then performing H can reinforce L (i.e., doing the desirable high-frequency behavior can increase the likelihood of the less frequent target behavior).
Common example: “First eat your dinner (less preferred), then you may have a cookie (more preferred).”
Formal idea: if P(H) > P(L), then H reinforces L. P(H) > P(L)
ightarrow ext{H reinforces L}
Extinction (in OC context)
If a behavior is no longer reinforced, it may gradually disappear (extinction).
Discriminative stimulus (OC): signals that reinforcement is available; helps organisms learn when a behavior will or will not be rewarded.
Generalization: tendency to respond similarly to stimuli that resemble the original discriminative stimulus.
Discrimination: learning that only a specific stimulus will lead to reinforcement.
Shaping (successive approximations)
Process: target behavior not initially present; reinforce successive, closer approximations toward the target behavior.
Example: teacher rewards small steps toward a correct answer (e.g., first answering a hint, then answering a few words, then full correct answers), gradually raising the criterion.
Pros: effective for teaching new, complex behaviors; time-consuming but powerful, especially with children on the spectrum or with behavioral challenges.
Token economy
Use of tangible tokens (stickers, points) earned for good behavior; tokens can be exchanged for rewards.
Common in classrooms and ABA contexts; example in transcript: line leader, raffle tickets, special gifts; reward systems motivate and reinforce positive behavior.
Applied Behavior Analysis (ABA)
ABA is grounded in OC principles; widely used with children on the autism spectrum or with developmental/behavioral challenges.
Emphasizes positive reinforcement; criticism arises when over-reliance on punishment or overly mechanical reinforcement reduces intrinsic motivation.
Often uses token economies and shaping as core tools.
OC in education and real-world settings
Behavior modification is common in classrooms; psychologists and educators implement OC-based strategies, including token economies and shaping.
ABA is a prominent framework; its foundations are in OC (and sometimes linked to desensitization processes for phobias).
OC and phobias/desensitization
Techniques from OC can support desensitization efforts by reinforcing progress during exposure steps or relaxation.
Example: gradual exposure to fear stimuli (e.g., spiders) paired with controlled relaxation, with reinforcement for progressing through exposure steps.
Observational Learning (Social Cognitive Theory): core ideas
What is observational learning (OL)?
Learning a new behavior by watching a model perform it; can be desirable or undesirable behaviors.
Albert Bandura’s evolution to Social Cognitive Theory (SCT)
Early OL emphasized modeling; SCT adds cognitive factors and processes that influence whether imitation occurs.
Key cognitive processes in SCT
Attention: must notice the behavior to learn it. Attention
eq 0Retention: must store the observed behavior in memory.
Reproduction: must be able to reproduce the observed behavior.
Motivation: must want to perform the behavior; depends on anticipated consequences and perceived rewards.
Bobo doll experiment (classic OL demonstration)
Children observed an adult model punching, kicking, and swearing at a Bobo doll.
After observing, children were placed in a room with a Bobo doll and typically imitated the aggressive behaviors and language observed.
Demonstrated that aggression can be learned via observation and that consequences can influence imitation (vicarious reinforcement/punishment).
OL vs. behaviorism (classic OC/CC differences)
OL does not require immediate behavioral change to occur; learning can happen without visible change immediately.
SCT acknowledges cognitive mediators (attention, memory) beyond simple stimulus–response links.
Consequences and vicarious learning
Observing model consequences influences imitation: if the model is rewarded, imitation is more likely; if punished, imitation may decrease (vicarious reinforcement/punishment).
Real-world examples and discussion prompts from the session
Publicized mass murder coverage: debates about whether frequent exposure increases prevalence via modeling or desensitization.
Factors influencing imitation from models: similarity to the observer; prestige or status of the model; accessibility and visibility of the model; whether the model is personally relevant (family, peers) or media-based (celebrities).
Observing family members or peers: if a sibling cheats or behaves aggressively and is rewarded, the observer may imitate; if punished, may avoid imitation.
Prosocial modeling: OL can also promote helping and cooperation when models are rewarded for prosocial acts.
Modeling determinants and practical implications
Similarity: observers are more likely to imitate models who resemble them in gender, race, age, etc. ext{Similarity}
ightarrow ext{Imitation likelihood}Prestige/attractiveness: celebrities or prestigious figures can be more influential role models.
Observability and frequency: repeated exposure and clear demonstration of the behavior increase imitation odds.
Family/peer influence: close, familiar models can be highly impactful due to ongoing exposure and relevance.
Practical: differences and similarities between OC and SCT in applied settings
OC emphasizes observable change due to consequences; OL/SCT emphasizes cognitive processing and potential for change without immediate observable behavior.
Both frameworks acknowledge reinforcement and punishment; OC centers on actual consequences to the actor, while SCT includes indirect reinforcement via observing consequences to others (vicarious reinforcement).
Intersections, contrasts, and big-picture takeaways
Core similarities
Both OC and OL/SCT stress the role of consequences in shaping behavior (direct consequences for the actor or observed consequences via a model).
Both acknowledge that reinforcement and punishment influence the likelihood of future behavior.
Core differences
OC focuses on observable behavior and direct reinforcement/punishment effects; emphasis on emission of the behavior after consequences.
SCT emphasizes cognitive factors (attention, retention, memory, motivation) and the potential for learning without immediate behavioral display; it includes vicarious learning through observation of others’ outcomes.
In OC, modeling can be less central; in SCT, modeling and cognitive appraisal of models are central to learning and imitation.
Modeling and learning dynamics
Observational learning can produce learning without immediate behavior change; practice and reinforcement (direct or vicarious) may be needed for production.
In shaping and token economies, observed behaviors can be shaped through incremental reinforcement, which aligns with OC principles and can be enhanced by modeling and cognitive rehearsal (SCT) when combined.
Applications and practical implications
In education and therapy
OC tools: shaping, token economies, continuous reinforcement for new behaviors, extinction of undesired behaviors, and Premack-based routines.
ABA: a structured OC-based approach frequently used with children on the autism spectrum; emphasizes positive reinforcement and functional behavior analyses; ethical considerations include balancing reinforcement with meaningful, intrinsically engaging activities.
Observational learning in classrooms and media literacy: teaching students to critically assess modeled behaviors and their consequences; encouraging prosocial modeling.
In clinical settings
Desensitization for phobias (systematic desensitization) can involve OC elements (reward progress through exposure steps) and OL components (models demonstrating calm exposure and successful coping).
Ethical considerations
Heavy reliance on punishment can have negative side effects (fear, avoidance, damaged self-esteem); contemporary practice favors positive reinforcement and humane behavior modification.
The depiction of violence in media and its potential to influence behavior remains contested and context-dependent; desensitization may reduce emotional reactivity but could normalize aggression in some observers.
Real-world implications for exam and clinical practice
Be prepared to identify whether a described scenario reflects PR, NR, PP, or NP, and to explain the expected outcome on target behavior.
Be able to explain and apply Premack principle in a scenario and identify the high vs. low probability behaviors.
Distinguish when shaping, extinction, or token economies are appropriate and how to implement them.
Recognize the cognitive steps of observational learning (Attention, Retention, Reproduction, Motivation) and how these steps influence learning without immediate behavioral change.
Thoughtful questions for reflection or discussion
How does mass-media exposure to violence potentially influence behavior through OL/SCT and through desensitization? What moderating factors (e.g., motive, context, individual differences) might change outcomes?
When is punishment justified, and how can it be implemented ethically to minimize harm and maximize learning of acceptable behaviors?
In which contexts might OC-based strategies be less effective, and how can cognitive factors from SCT be leveraged to enhance learning and retention?
Quick reference: key definitions and formulas (LaTeX)
Positive reinforcement: adding a stimulus to increase a behavior.
Negative reinforcement: removing a stimulus to increase a behavior.
Positive punishment: adding a stimulus to decrease a behavior.
Negative punishment: removing a stimulus to decrease a behavior.
Premack principle: If a higher-probability behavior H occurs more frequently than a lower-probability behavior L, then performing H can reinforce L. P(H) > P(L)
ightarrow ext{H reinforces L}Extinction: when reinforcement for a previously reinforced behavior is stopped, the behavior gradually decreases.
Spontaneous recovery: after extinction, a previously extinguished behavior may reappear following a rest period.
Discriminative stimulus (SD): a cue that signals reinforcement is available for a given behavior.
Generalization: responding similarly to stimuli that are similar to the original SD.
Discrimination: learning to respond differently to distinct SDs.
Shaping: reinforcing successive approximations toward a final target behavior.
Token economy: using tokens as conditioned reinforcers that can be exchanged for rewards.
Observational learning (OL) / Social Cognitive Theory (SCT): learning by watching others, with cognitive mediators such as Attention, Retention, Reproduction, and Motivation.
Bobo doll experiment: classic OL demonstration showing imitation of observed aggression.
Vicarious reinforcement/punishment: learning about consequences by observing others' outcomes and adjusting behavior accordingly.
Desensitization: gradual exposure to feared stimuli paired with coping strategies to reduce fear or anxiety.
Quick study prompts
Differentiate positive/negative reinforcement from positive/negative punishment with your own classroom examples.
Explain Premack principle using a question about a student’s preferences (e.g., preferred activities vs. required tasks).
Describe the four cognitive processes in SCT and why attention is the initial gatekeeper for observational learning.
Summarize the Bobo doll experiment and its significance for viewing aggression as learnable through observation.
List practical steps for implementing a token economy in a classroom and discuss potential limitations.
Compare and contrast OC and SCT in terms of when you would expect observable behavior change and when you would expect learning to occur without immediate change.