Psychology 101: Week 6 - Operant Conditioning, Reinforcement, and Social Learning

Edward L. Thorndike and the Foundations of Operant Conditioning

  • The Puzzle Box Paradigm     * Edward L. Thorndike conducted experiments where cats were placed inside specifically designed "puzzle boxes."     * Food was placed outside the box to serve as a motivator.     * The cats initially struggled and engaged in random behaviors to escape.     * Eventually, the cats would accidentally step on a lever, which opened the door and allowed them access to the food.     * Learning Curve: Thorndike observed that with each repeated trial, the time it took for the cats to reach the food decreased, indicating a systematic learning process rather than a sudden insight.

  • The Law of Effect     * Thorndike formulated the "Law of Effect," which states that successful behaviors are likely to be repeated.     * Satisfying State of Affairs: Behaviors that lead to a positive or satisfying outcome are more likely to occur again in the future.     * Annoying State of Affairs: Behaviors that lead to a negative or annoying outcome are unlikely to be repeated.

B.F. Skinner and Behaviorism

  • The Pioneer of Behaviorism     * B.F. Skinner was a major pioneer of the behaviorist school of thought, emphasizing that all responses should be scientifically and objectively measured.

  • The Skinner Box (Operant Chamber)     * Skinner developed the Operant Chamber, commonly known as the Skinner box.     * This apparatus was used to shape complex behaviors in animals by providing controlled stimuli and consequences.

Core Principles of Operant Conditioning

  • Definition of Operant Conditioning     * It is the process of learning an association between a specific behavior and a consequence.     * A critical requirement is that the consequence must follow the behavior.

  • Key Terminology     * Operant: An action performed on an environment that produces consequences.     * Reinforcement: Any event that occurs after a response and increases the likelihood that the behavior will be repeated.

  • Types of Reinforcement     * Positive Reinforcement: The addition of a desirable stimulus following a behavior to increase that behavior.         * Examples: Petting a dog that comes when called; paying an employee for work completed; receiving an AA grade.     * Negative Reinforcement: The removal of an unpleasant stimulus following a behavior to increase the likelihood of that behavior.         * Examples: Taking painkillers to end a headache; fastening a seatbelt to stop a loud beeping sound.

  • Primary vs. Secondary Reinforcers     * Primary Reinforcers: These are innately satisfying and tied to basic survival needs. Examples include food, physical contact, social interaction, warmth, pride, contentment, and feelings of safety.     * Secondary (Conditioned) Reinforcers: These gain their power through association with primary reinforcers via classical conditioning. On their own, they mean nothing. Examples include money, tokens, good grades, and pleasantries.

  • Nuance in Reinforcement (The Parent-Child Example)     * If a parent gives a child candy to stop a tantrum:         * The parent is negatively reinforced because their action (giving candy) successfully stopped the unpleasant stimulus (the tantrum).         * The child is positively reinforced because their action (throwing a tantrum) resulted in a desirable stimulus (candy), making them more likely to throw tantrums in the future.

Punishment and Its Consequences

  • Definition of Punishment     * Any consequence that decreases the likelihood of a behavior occurring again.

  • Types of Punishment     * Positive Punishment: The addition of an unpleasant stimulus to decrease a behavior.         * Examples: Spraying water on a barking dog; issuing a traffic ticket for speeding.     * Negative Punishment: The removal of a desirable stimulus to decrease a behavior.         * Examples: Revoking driving privileges; banning a rude person from a chat room.

  • Side Effects of Punishment     * Behavioral Suppression: Punishment may only suppress the behavior temporarily rather than eliminating it.     * Creation of Fear: Punishment can lead to fear, which may indirectly prevent desirable behaviors.         * Example: A student who receives a failing grade and is punished for it may associate honesty with negative consequences, leading them to lie to their parents in the future.     * Increased Aggression: Punishment can teach or increase cruelty; children may observe violence and associate it as appropriate behavior for adults.

  • Effectiveness: Reinforcement vs. Punishment     * Reinforcement is generally more effective because it indicates the correct behavior exactly (it tells the subject what to do).     * Example: Praising a child for good grades is more effective than punishing them for poor ones, as punishment does not provide a path for improvement.

Discrimination and Generalization

  • Discriminative Stimuli     * These are contextual cues present during the pairing of a behavior and its consequence.     * They indicate the specific context in which a behavior is likely to result in a consequence.     * Subjects learn that some responses will be reinforced or punished only in certain environments.

  • Generalization     * This occurs when a learned behavior is repeated across a wider variety of situations beyond the original context.

Shaping

  • Successive Approximations     * Shaping involves creating complex behaviors by reinforcing successive approximations of the desired goal.     * Each step or response is rewarded as it comes closer to the final behavior.     * Discrete segments of behavior are chained together to comprise the whole.

  • Example: Teaching a Dog to Roll Over     * Step 1: Reward the dog for any behavior resembling the goal, such as lying down.     * Step 2: Reinforce behaviors that get closer to the goal, such as rolling onto one side.     * Step 3: Finally reinforce only the full lying on the back and complete roll.

Reinforcement Schedules

  • Introduction     * Schedules of reinforcement affect both the speed of learning and the retention (resistance to extinction) of the behavior.

  • Continuous Reinforcement     * The behavior is reinforced every single time it occurs.     * Result: Rapid learning, but also rapid extinction (the behavior stops quickly once rewards cease). This is less common in real-world scenarios.

  • Partial Reinforcement     * The behavior is only reinforced some of the time.     * Partial-Reinforcement Extinction Effect: Behaviors persist longer under partial reinforcement than continuous reinforcement.     * Strategic Training: To create persistent behavior, one should reinforce continuously during the initial learning phase and then switch to a partial reinforcement schedule.     * Result: Slower learning but highly resistant to extinction.

  • Ratio Schedules (Based on numbers of responses)     * Fixed Ratio (Every so many): Reinforcement is provided after every nnth behavior.         * Example: Buying 1010 coffees to get 11 free; paying workers per product produced.         * Pros: Elicits robust, high-frequency responses (workers are more productive when paid for volume rather than time).         * Cons: Natural biological dispositions can constrain what stimuli and responses can be easily associated.     * Variable Ratio (After an unpredictable number): Reinforcement occurs after a random, unpredictable number of behaviors.         * Example: Slot machines; fly fishing.         * Cons: Organisms learn behaviors similar to their natural ones most easily; unnatural behaviors tend to "instinctively drift" back toward natural patterns.

  • Interval Schedules (Based on the passage of time)     * Fixed Interval (Every so often): Reinforcement is provided for the first behavior after a fixed amount of time has passed.         * Example: Tuesday discount prices.         * Scalloping Effect: The rate of behavior increases sharply just before the expected time of reinforcement and drops immediately after (e.g., studying intensity increasing right before an exam).         * Cons: Organisms develop an expectation that the Unconditioned Stimulus (US) signals the arrival of the Conditioned Stimulus (CS), yet the expectation itself does not cause the arrival.     * Variable Interval (Unpredictably often): Reinforcement is provided after a random, unpredictable amount of time.         * Example: Checking a phone for messages (the act of checking does not influence when the message arrives).         * Pros: Results in more consistent response rates because the timing of the next reward is unknown.         * Cons: Organisms may develop an expectation that a response will be reinforced/punished; involves latent learning without immediate reinforcement.

Token Economies

  • Definition: A behavior modification system where individuals earn tokens (secondary reinforcers) for completing tasks or positive behaviors and lose them for bad behavior.

  • Mechanism: Tokens are later traded for tangible objects or privileges.

  • Psychological Impact: Provides participants with a sense of control over their environment.

Biological and Cognitive Constraints on Learning

  • Limitations of Behaviorism     * While behaviorists believed all behavior could be explained by conditioning, modern research shows reinforcement only explains a portion of human behavior.

  • Biological Factors     * Nature limits a species' capacity for operant conditioning.     * Instinctive Drift: Animals tend to revert back to biologically predisposed patterns even after being trained.     * The Miserly Raccoons Example: Raccoons were trained to put coins in a piggy bank. However, they began dipping the coins and rubbing them together (an innate food-washing behavior) instead of the trick. This was only avoided when they were trained to dunk a basketball instead, which aligned more with natural movements.

  • Cognitive Processes and Tolman     * Edward Tolman demonstrated that learning can occur without immediate reinforcement.     * Latent Learning: Learning that stays hidden until there is an incentive to demonstrate it.     * Cognitive Maps: Mental representations of physical space.     * Tolman’s Maze Study:         * Rats in a "plus" maze: Rats started in Arm AA and found food in Arm BB. When placed in the opposite Arm CC, Stimulus-Response (S-R) theory predicted they would turn right (same physical movement), but the rats turned left toward the actual location of the food, proving they had a mental map.         * Three-Group Study (17 days):             * Group 1: Never rewarded; made many mistakes.             * Group 2: Always rewarded; improved steadily.             * Group 3: Rewarded for the first time after Day 1111. This group made the fewest mistakes once food was introduced, showing they had learned the maze all along through latent learning.     * Insight Learning: The sudden understanding of a solution after a period of inaction or reflection.

Motivation

  • Intrinsic Motivation: Performing an activity because the act itself is inherently rewarding (e.g., hobbies). Social validation and acceptance are strong human intrinsic motivators.

  • Extrinsic Motivation: Performing an activity to receive an external reward (e.g., working for money).

  • Impact of Rewards: External rewards can decrease intrinsic motivation, particularly when rewards are contingent on performance (turning a hobby into a job). This varies with age; tangible rewards are more harmful to the motivation of children than college students when tied to performance. Unexpected rewards are generally okay.

  • Phasing of Verbal Rewards:     * Controlling: "You should keep up the good work" (decreases intrinsic motivation).     * Informational: "You got a score of [X], which is better than average" (maintains or increases motivation).

Social Learning and Modeling

  • Scope of Social Learning     * Includes learning mechanical skills, social etiquette, situational anxiety, and attitudes toward politics and religion.     * Observed in social animals like monkeys, birds, whales, and crows (who learn to attack predators that harmed their peers).

  • Observational Learning and Modeling     * Observational Learning: Learning a new behavior or altering an old one by watching others.     * Modeling: Specifically imitating the behavior of another person.     * Characteristics of Modeling:         * We are more likely to imitate attractive, high-status models or those similar to ourselves (a tactic used by advertisers).         * Modeling happens regardless of consequences (unlike vicarious learning).         * Can lead to prosocial behavior (actions that benefit others).

  • Bandura’s Bobo Doll Study (1961)     * Group 1: Watched a film of an adult playing quietly with a Bobo doll.     * Group 2: Watched a film of an adult attacking the Bobo doll.     * Result: Viewers of the aggressive model were more than twice as likely to behave aggressively with the doll, especially if they were already frustrated.

  • Vicarious Conditioning     * Learning the consequences of an action by observing others being rewarded or punished.     * Rewarded Behavior: More likely to be imitated.     * Punished Behavior: Less likely to be imitated (e.g., watching a sibling get punished for a behavior).

Mirror Neurons

  • Definition and Function     * Mirror neurons are cells that fire both when an individual performs an action and when they observe someone else performing that same action.     * They play a role in action understanding.     * Key Detail: Activation typically does not lead to the actual movement of muscles or sensory fibers.

  • Experimental Evidence and Debate     * Discovered originally in monkeys (firing when a monkey eats a banana and sees a human eating a banana).     * Confirmed in humans and other animals.     * Scientists debate if they are unique to social creatures or linked to empathy.     * Caution: Their role is frequently overblown in popular press; they are important for action understanding, but their complexity is still being studied.