Psychology 101: Week 6 - Operant Conditioning, Reinforcement, and Social Learning

Edward L. Thorndike and the Foundations of Operant Conditioning

The Puzzle Box Paradigm * Edward L. Thorndike conducted experiments where cats were placed inside specifically designed "puzzle boxes." * Food was placed outside the box to serve as a motivator. * The cats initially struggled and engaged in random behaviors to escape. * Eventually, the cats would accidentally step on a lever, which opened the door and allowed them access to the food. * Learning Curve: Thorndike observed that with each repeated trial, the time it took for the cats to reach the food decreased, indicating a systematic learning process rather than a sudden insight.
The Law of Effect * Thorndike formulated the "Law of Effect," which states that successful behaviors are likely to be repeated. * Satisfying State of Affairs: Behaviors that lead to a positive or satisfying outcome are more likely to occur again in the future. * Annoying State of Affairs: Behaviors that lead to a negative or annoying outcome are unlikely to be repeated.

B.F. Skinner and Behaviorism

The Pioneer of Behaviorism * B.F. Skinner was a major pioneer of the behaviorist school of thought, emphasizing that all responses should be scientifically and objectively measured.
The Skinner Box (Operant Chamber) * Skinner developed the Operant Chamber, commonly known as the Skinner box. * This apparatus was used to shape complex behaviors in animals by providing controlled stimuli and consequences.

Core Principles of Operant Conditioning

Definition of Operant Conditioning * It is the process of learning an association between a specific behavior and a consequence. * A critical requirement is that the consequence must follow the behavior.
Key Terminology * Operant: An action performed on an environment that produces consequences. * Reinforcement: Any event that occurs after a response and increases the likelihood that the behavior will be repeated.
Types of Reinforcement * Positive Reinforcement: The addition of a desirable stimulus following a behavior to increase that behavior. * Examples: Petting a dog that comes when called; paying an employee for work completed; receiving an $A$ grade. * Negative Reinforcement: The removal of an unpleasant stimulus following a behavior to increase the likelihood of that behavior. * Examples: Taking painkillers to end a headache; fastening a seatbelt to stop a loud beeping sound.
Primary vs. Secondary Reinforcers * Primary Reinforcers: These are innately satisfying and tied to basic survival needs. Examples include food, physical contact, social interaction, warmth, pride, contentment, and feelings of safety. * Secondary (Conditioned) Reinforcers: These gain their power through association with primary reinforcers via classical conditioning. On their own, they mean nothing. Examples include money, tokens, good grades, and pleasantries.
Nuance in Reinforcement (The Parent-Child Example) * If a parent gives a child candy to stop a tantrum: * The parent is negatively reinforced because their action (giving candy) successfully stopped the unpleasant stimulus (the tantrum). * The child is positively reinforced because their action (throwing a tantrum) resulted in a desirable stimulus (candy), making them more likely to throw tantrums in the future.

Punishment and Its Consequences

Definition of Punishment * Any consequence that decreases the likelihood of a behavior occurring again.
Types of Punishment * Positive Punishment: The addition of an unpleasant stimulus to decrease a behavior. * Examples: Spraying water on a barking dog; issuing a traffic ticket for speeding. * Negative Punishment: The removal of a desirable stimulus to decrease a behavior. * Examples: Revoking driving privileges; banning a rude person from a chat room.
Side Effects of Punishment * Behavioral Suppression: Punishment may only suppress the behavior temporarily rather than eliminating it. * Creation of Fear: Punishment can lead to fear, which may indirectly prevent desirable behaviors. * Example: A student who receives a failing grade and is punished for it may associate honesty with negative consequences, leading them to lie to their parents in the future. * Increased Aggression: Punishment can teach or increase cruelty; children may observe violence and associate it as appropriate behavior for adults.
Effectiveness: Reinforcement vs. Punishment * Reinforcement is generally more effective because it indicates the correct behavior exactly (it tells the subject what to do). * Example: Praising a child for good grades is more effective than punishing them for poor ones, as punishment does not provide a path for improvement.

Discrimination and Generalization

Discriminative Stimuli * These are contextual cues present during the pairing of a behavior and its consequence. * They indicate the specific context in which a behavior is likely to result in a consequence. * Subjects learn that some responses will be reinforced or punished only in certain environments.
Generalization * This occurs when a learned behavior is repeated across a wider variety of situations beyond the original context.

Shaping

Successive Approximations * Shaping involves creating complex behaviors by reinforcing successive approximations of the desired goal. * Each step or response is rewarded as it comes closer to the final behavior. * Discrete segments of behavior are chained together to comprise the whole.
Example: Teaching a Dog to Roll Over * Step 1: Reward the dog for any behavior resembling the goal, such as lying down. * Step 2: Reinforce behaviors that get closer to the goal, such as rolling onto one side. * Step 3: Finally reinforce only the full lying on the back and complete roll.

Reinforcement Schedules

Introduction * Schedules of reinforcement affect both the speed of learning and the retention (resistance to extinction) of the behavior.
Continuous Reinforcement * The behavior is reinforced every single time it occurs. * Result: Rapid learning, but also rapid extinction (the behavior stops quickly once rewards cease). This is less common in real-world scenarios.
Partial Reinforcement * The behavior is only reinforced some of the time. * Partial-Reinforcement Extinction Effect: Behaviors persist longer under partial reinforcement than continuous reinforcement. * Strategic Training: To create persistent behavior, one should reinforce continuously during the initial learning phase and then switch to a partial reinforcement schedule. * Result: Slower learning but highly resistant to extinction.
Ratio Schedules (Based on numbers of responses) * Fixed Ratio (Every so many): Reinforcement is provided after every $n$ th behavior. * Example: Buying $10$ coffees to get $1$ free; paying workers per product produced. * Pros: Elicits robust, high-frequency responses (workers are more productive when paid for volume rather than time). * Cons: Natural biological dispositions can constrain what stimuli and responses can be easily associated. * Variable Ratio (After an unpredictable number): Reinforcement occurs after a random, unpredictable number of behaviors. * Example: Slot machines; fly fishing. * Cons: Organisms learn behaviors similar to their natural ones most easily; unnatural behaviors tend to "instinctively drift" back toward natural patterns.
Interval Schedules (Based on the passage of time) * Fixed Interval (Every so often): Reinforcement is provided for the first behavior after a fixed amount of time has passed. * Example: Tuesday discount prices. * Scalloping Effect: The rate of behavior increases sharply just before the expected time of reinforcement and drops immediately after (e.g., studying intensity increasing right before an exam). * Cons: Organisms develop an expectation that the Unconditioned Stimulus (US) signals the arrival of the Conditioned Stimulus (CS), yet the expectation itself does not cause the arrival. * Variable Interval (Unpredictably often): Reinforcement is provided after a random, unpredictable amount of time. * Example: Checking a phone for messages (the act of checking does not influence when the message arrives). * Pros: Results in more consistent response rates because the timing of the next reward is unknown. * Cons: Organisms may develop an expectation that a response will be reinforced/punished; involves latent learning without immediate reinforcement.

Token Economies

Definition: A behavior modification system where individuals earn tokens (secondary reinforcers) for completing tasks or positive behaviors and lose them for bad behavior.
Mechanism: Tokens are later traded for tangible objects or privileges.
Psychological Impact: Provides participants with a sense of control over their environment.

Biological and Cognitive Constraints on Learning

Limitations of Behaviorism * While behaviorists believed all behavior could be explained by conditioning, modern research shows reinforcement only explains a portion of human behavior.
Biological Factors * Nature limits a species' capacity for operant conditioning. * Instinctive Drift: Animals tend to revert back to biologically predisposed patterns even after being trained. * The Miserly Raccoons Example: Raccoons were trained to put coins in a piggy bank. However, they began dipping the coins and rubbing them together (an innate food-washing behavior) instead of the trick. This was only avoided when they were trained to dunk a basketball instead, which aligned more with natural movements.
Cognitive Processes and Tolman * Edward Tolman demonstrated that learning can occur without immediate reinforcement. * Latent Learning: Learning that stays hidden until there is an incentive to demonstrate it. * Cognitive Maps: Mental representations of physical space. * Tolman’s Maze Study: * Rats in a "plus" maze: Rats started in Arm $A$ and found food in Arm $B$ . When placed in the opposite Arm $C$ , Stimulus-Response (S-R) theory predicted they would turn right (same physical movement), but the rats turned left toward the actual location of the food, proving they had a mental map. * Three-Group Study (17 days): * Group 1: Never rewarded; made many mistakes. * Group 2: Always rewarded; improved steadily. * Group 3: Rewarded for the first time after Day $11$ . This group made the fewest mistakes once food was introduced, showing they had learned the maze all along through latent learning. * Insight Learning: The sudden understanding of a solution after a period of inaction or reflection.

Motivation

Intrinsic Motivation: Performing an activity because the act itself is inherently rewarding (e.g., hobbies). Social validation and acceptance are strong human intrinsic motivators.
Extrinsic Motivation: Performing an activity to receive an external reward (e.g., working for money).
Impact of Rewards: External rewards can decrease intrinsic motivation, particularly when rewards are contingent on performance (turning a hobby into a job). This varies with age; tangible rewards are more harmful to the motivation of children than college students when tied to performance. Unexpected rewards are generally okay.
Phasing of Verbal Rewards: * Controlling: "You should keep up the good work" (decreases intrinsic motivation). * Informational: "You got a score of [X], which is better than average" (maintains or increases motivation).

Social Learning and Modeling

Scope of Social Learning * Includes learning mechanical skills, social etiquette, situational anxiety, and attitudes toward politics and religion. * Observed in social animals like monkeys, birds, whales, and crows (who learn to attack predators that harmed their peers).
Observational Learning and Modeling * Observational Learning: Learning a new behavior or altering an old one by watching others. * Modeling: Specifically imitating the behavior of another person. * Characteristics of Modeling: * We are more likely to imitate attractive, high-status models or those similar to ourselves (a tactic used by advertisers). * Modeling happens regardless of consequences (unlike vicarious learning). * Can lead to prosocial behavior (actions that benefit others).
Bandura’s Bobo Doll Study (1961) * Group 1: Watched a film of an adult playing quietly with a Bobo doll. * Group 2: Watched a film of an adult attacking the Bobo doll. * Result: Viewers of the aggressive model were more than twice as likely to behave aggressively with the doll, especially if they were already frustrated.
Vicarious Conditioning * Learning the consequences of an action by observing others being rewarded or punished. * Rewarded Behavior: More likely to be imitated. * Punished Behavior: Less likely to be imitated (e.g., watching a sibling get punished for a behavior).

Mirror Neurons

Definition and Function * Mirror neurons are cells that fire both when an individual performs an action and when they observe someone else performing that same action. * They play a role in action understanding. * Key Detail: Activation typically does not lead to the actual movement of muscles or sensory fibers.
Experimental Evidence and Debate * Discovered originally in monkeys (firing when a monkey eats a banana and sees a human eating a banana). * Confirmed in humans and other animals. * Scientists debate if they are unique to social creatures or linked to empathy. * Caution: Their role is frequently overblown in popular press; they are important for action understanding, but their complexity is still being studied.