PSYC 466 Elements of Learning
What is Learning-
Learning means something different when used in different kind of technical context.
You can learn algebra; a language; skiing.
There is no consensus on a definition of learning, as it encompasses various processes including behavioral changes, knowledge acquisition, and skill development across diverse domains. It is a relatively permanent change in behavioral potentiality that occurs as a result of experience.
An example of this is when a student practices math problems repeatedly, leading to improved problem-solving skills and a deeper understanding of algebraic concepts.
An non-example of this is when someone sustains a broken bone. You don’t learn about the broken bone but rather adapt, altering your behavior to cope with the injury without acquiring any new knowledge or skills related to learning itself.
There are at least three types of regularities the three types of learning
Regularities in the presence of one stimulus over time (Habituation)
Ex: A child becomes less responsive to a recurring loud noise, such as a siren, after being exposed to it repeatedly.Regularities in the presence of two stimuli (Classical Conditioning)
Ex: Pavlov's dogs salivated at the sound of a bell after it was paired with the presentation of food, illustrating how a neutral stimulus can come to elicit a response when associated with another stimulus.Regularities in the presence of a behavior and a stimulus (Operant Conditioning)
Ex: A dog learns to sit on command to receive a treat, demonstrating how a behavior can be reinforced by a consequence.
Issues in Learning Theory
Determinism vs Free Will.
Nature vs. Nurture
Humans and other animals compared
A Brief History
Study of learning is old
Knowledge it was traditionally studied under the scope of epistemology
The first point of contention: Innate or learned?
Plato: Rationalism
Innate knowledge
Reason, Judgement, Logic
Aristotle: Empiricism
Emphasis that we gain knowledge only through our senses
Tabula rasa - The concept that individuals are born as a 'blank slate' and acquire knowledge through experience and perception.
Associationism - The theory that mental processes operate by the association of one idea with another, suggesting that learning occurs through the formation of connections between stimuli and responses.
A Brief History: Modern Science Era
The 19th Century was amazing for science as the century was the time we understand more mechanistic understanding of human beings
Physiology
Electrical stimulation of nerves
Notions of Stimulus and response
Studying the phsilogical basis of human perception opened the way foor a scientific (experimental) study of psychology
Psychophysics
Weber-Fechner Law: Quantitative law of the relation between sensation and intensity of the physical stimulus
JND= K x S = Weber’s Law. Or K= JND/S
JND= Just noticeable difference (perception)
K = constant to be determined (environment)
S = Given standard stimulus (environment)
At the same time, the rise of Darwinism
Adaptation
In America, the school of Functionalism
Habits we developed that we don’t think of (Looking both ways before crossing the street)
Focused on how mental processes help individuals adapt to their environments, emphasizing the functionality of psychological processes.
Psychology is mostly an American thing
Early Research on Learning
Herman Ebbinghaus and rote (repeating) verbal learning
He studied
How do we forget
Mere passage of time
Retroactive and proactive inhibition
Edward Thorndike
Puzzle box
Studying trial-and-error learning
Testing memory by placing the animal in the situation again
He used the puzzle box to observe whether the animal recalls the solution more quickly after prior exposure.
Innate Patterns
Understanding Animal Behavior
Question this:
Do only humans have reason?
If so, how do animals take care of themselves?
Instincts?
Certain behaviors are repeated very often, both individually and by animals of the same species. These regularities allow for a scientific understanding.
A stimulus is anything that causes a reaction or change in an organism or system. A response is the reaction or behavior that occurs as a result of a stimulus.
A reflex is an involuntary and nearly instantaneous movement in response to a stimulus. Reflexes are automatic and do not involve conscious thought. They are typically unlearned behaviors, often serving as protective mechanisms for the body.
Instincts
Some theorists suggested even that we might have more instincts
Whatever the case, Darwin’s theory of evolution changed our definition of “instinct”.
Learning Enables Adaptation
Natural selection and sexual selection
Depending on environmental demands, certain of these traits might result in a reproductive advantage.
Behavioral traits are also important
The ability to adapt to ones environment with experience enhances survival and/or passing one genes
Instincts are innate behaviors typical of a species, resulting from natural and sexual selection.
These terms remain controversial and vaguely defined
Evolutionary psychology.
Elicited Behaviors
Elicited Behaviors are not all learned
Example: spiders don’t learn to weave good webs; web-spinning is largely inherited
For simple animals, underlying action mechanisms are rudimentary yet powerful
Even simple systems are adaptive and functional in initial life stages
Most of these patterns involve changes in behavior as a function of changes in stimulation
Simple Orientation Mechanisms
Simple animals may have little or no nervous system, or a rudimentary one
Primitive senses guide orientation (e.g., avoidance) rather than complex cognition
(So, for simple animals, their most fundamental senses are sufficient to direct their movement and behavior, such as avoiding threats (orientation). They don't rely on or possess complex thinking or elaborate cognitive processes to navigate their environment or react to stimuli.)Tropisms:
Mostly associated with plants
Growth or turns in particular directions in response to particular stimulation (e.g., thermotropism, phototropism, and gravitropism, which allow plants to adapt to their environment by growing towards light, warmth, or gravity, respectively).
Taxes:
In simple and complex animals, involve locomotion towards or away from a stimulus
Ex: A moth flying towards a light source or bacteria moving away from a harmful chemical
Kineses:
Movements in random directions as a result of the intensity of stimulation
Ex: Woodlouse moving more rapidly and randomly in a dry, open area to increase its chances of finding a damp, sheltered spot where it can slow down and move less randomly.
Elicit Behaviors: Relexes
Reflex Action
Refined behavior in more complex animals depends on individual history and environment
Reflexes are influenced by multiple interacting mechanisms and can be simple or complex.
Reflex Action: History and Theories
Rene Descartes (1596–1650): proposed animals are mindless and soulless; all activities could be explained by reflexes; the idea of the “Reflex Arc”
By mid-1800s, physiology expanded the study of reflexes. Started to play with electricity.
Neural transmission and the idea that mental processes could be analyzed via reflexes
Ivan Sechenov argued that mental processes are reflexive responses to stimuli and could be understood through biology
Reflex Action: Definition and Key Features
A reflex is a stereotyped pattern (the same muscle moves in the same way) of movement of a body part elicited by a specific stimulus.
Characteristics:
Rapid, predictable, and involuntary
Mostly inborn (innate)
Present in newborns; some reflexes disappear with age
The stimulus–response relationship resembles a cause-and-effect link, often independent of past history, yet highly adaptive
Most reflexes help avoid, escape from, or minimize the impact of noxious stimuli
Innate Patterns
Understanding Animal Behavior
Question this:
Do only humans have reason?
If so, how do animals take care of themselves?
Instincts?
Certain behaviors are repeated very often, both individually and by animals of the same species. These regularities allow for a scientific understanding.
A stimulus is anything that causes a reaction or change in an organism or system. A response is the reaction or behavior that occurs as a result of a stimulus.
A reflex is an involuntary and nearly instantaneous movement in response to a stimulus. Reflexes are automatic and do not involve conscious thought. They are typically unlearned behaviors, often serving as protective mechanisms for the body.
Instincts
Some theorists suggested even that we might have more instincts
Whatever the case, Darwin’s theory of evolution changed our definition of “instinct”.
Learning Enables Adaptation
Natural selection and sexual selection
Depending on environmental demands, certain of these traits might result in a reproductive advantage.
Behavioral traits are also important
The ability to adapt to ones environment with experience enhances survival and/or passing one genes
Instincts are innate behaviors typical of a species, resulting from natural and sexual selection.
These terms remain controversial and vaguely defined
Evolutionary psychology.
Elicited Behaviors
Elicited Behaviors are not all learned
Example: spiders don’t learn to weave good webs; web-spinning is largely inherited
For simple animals, underlying action mechanisms are rudimentary yet powerful
Even simple systems are adaptive and functional in initial life stages
Most of these patterns involve changes in behavior as a function of changes in stimulation
Simple Orientation Mechanisms
Simple animals may have little or no nervous system, or a rudimentary one
Primitive senses guide orientation (e.g., avoidance) rather than complex cognition
(So, for simple animals, their most fundamental senses are sufficient to direct their movement and behavior, such as avoiding threats (orientation). They don't rely on or possess complex thinking or elaborate cognitive processes to navigate their environment or react to stimuli.)Tropisms:
Mostly associated with plants
Growth or turns in particular directions in response to particular stimulation (e.g., thermotropism, phototropism, and gravitropism, which allow plants to adapt to their environment by growing towards light, warmth, or gravity, respectively).
Taxes:
In simple and complex animals, involve locomotion towards or away from a stimulus
Ex: A moth flying towards a light source or bacteria moving away from a harmful chemical
Kineses:
Movements in random directions as a result of the intensity of stimulation
Ex: Woodlouse moving more rapidly and randomly in a dry, open area to increase its chances of finding a damp, sheltered spot where it can slow down and move less randomly.
Elicit Behaviors: Relexes
Reflex Action
Refined behavior in more complex animals depends on individual history and environment
Reflexes are influenced by multiple interacting mechanisms and can be simple or complex.
Reflex Action: History and Theories
Rene Descartes (1596–1650): proposed animals are mindless and soulless; all activities could be explained by reflexes; the idea of the “Reflex Arc”
By mid-1800s, physiology expanded the study of reflexes. Started to play with electricity.
Neural transmission and the idea that mental processes could be analyzed via reflexes
Ivan Sechenov argued that mental processes are reflexive responses to stimuli and could be understood through biology
Reflex Action: Definition and Key Features
A reflex is a stereotyped pattern (the same muscle moves in the same way) of movement of a body part elicited by a specific stimulus.
Characteristics:
Rapid, predictable, and involuntary
Mostly inborn (innate)
Present in newborns; some reflexes disappear with age
The stimulus–response relationship resembles a cause-and-effect link, often independent of past history, yet highly adaptive
Most reflexes help avoid, escape from, or minimize the impact of noxious stimuli
Properties of Reflexes (I)
Threshold: the stimulus must exceed a minimum level to elicit a response; expressed as S > T where S is stimulus intensity and T is threshold
Latency: The time required to have a reaction towards the stimulus.
The short latency is often due to direct sensory–motor connections in the spinal cord
Varies with factors such as receptor sensitivity and the number of neurons and synapses involved
Momentum: the response generally outlasts the stimulus that produced it
Temporal summation: Two subthreshold stimuli may excite a reflex when either alone would be ineffective
Refractory period:
After a response has occurred, the threshold of the reflex may be elevated for a brief time when the probability of response is zero no matter how strong the stimulus is.
Fixed Action Patterns
Some innate behavior patterns are more complex
Classical ethology: Konrad Lorenz, Niko Tinbergen, and others
Fixed Action Patterns (FAPs) are behavior sequences that are released by a specific environmental signal. They involve sequences of behaviors
They are not learned; they are built into genes,
Are stereotyped (occur the same every time and in every organism)
Continue to completion even if the guiding stimulus is removed.
Ex: The examples of innate patterns include reflex actions, such as the knee-jerk response, as well as instinctual behaviors like nesting in birds.
Do humans have Fixed Action Patterns?
Eibl-Eibesfeldt suggested that emotions like smiling and eyebrow flashing can be considered human FAPs
However, even innate patterns can be modified by experience: sensitivity to releasing signs can be tuned through learning, which fine-tunes responses to the environment and enhances survival
This shifting understanding explains why FAPs are rarely treated as completely unmodifiable. It is more of something called Modal Action Patterns.
Conclusion
Not all behavior is “learned”
The simplest animals can behave adaptively
Orientation reactions, taxes, kineses, tropism, reflexes, fixed and modal action patterns are basic behavioral adaptations to the environment
But with more complex niches, they become more complex
Habituation and Sensitization
Nonassociative learning.
Learning that is not associative. It is the simplest and most basic form of learning. Occurs when a subject is exposed once or repeatedly to a single type of stimulus.
Ex: Habituation. An organism gradually stops responding to a repeated stimulus, such as a loud sound, indicating that it has learned to ignore it.
Sensitization. An organism exhibits an increased response to a stimulus following exposure to a strong stimulus, illustrating the heightened sensitivity to potential threats.
Elicited behavior can either decrease or increase in frequency and/or intensity through habituation and sensitization.
Habituation
Reflexes can change.
The decline in responding that occurs with repeated presentation of a stimulus is called habituation. Occurs virtually in all animals.
Ex: A small amount of lemon juice or lime juice was placed on the tongues of women on each of 10 trials.
DVs:
Rate how much they liked the taste.
Amount of salivation
Stimulus Specificity (GRAPH)
The decrease in responding is specific to the habituated stimulus.
When a new and strong stimulus is presented, the response “recovers” (Increases again). This is called Dishabituation.
Habituation can be easily reversed by changing the stimulus.
Ex: Have you ever eaten at a buffet? If you eat the same food over and over again , you may start to lose interest and eat less. However, if a new dish is introduced, your appetite can be reignited, demonstrating how habituation can be disrupted by the introduction of novel stimuli.
The weaker the stimulus the more rapid and/or more pronounced is habituation
Strong stimuli may yield no significant habituation
E.g., skunk sprays, chronic pain, lawn mower
Attention and Habituation
When attention is diverted from the stimulus, individuals show much less habituation
Ex: Spicy food: people may become accustomed to the heat over time, but if they drink water after, they might not experience the same level of habituation.
Spontaneous recovery
If the stimulus is not presented, the response tends to recover over time.
Spontaneous recovery is the reappearance of a CR following a rest period after extinction.
CR; extinction
Ex: If you go to your friends house, you may get habituated overtime with their dogs smell. But if you dont go to your friends house in a while, Spontaneuous recovery may occur, and when you go back, you smell the dog smell strongly again.
Potentiation of habituation
If a response habituates and is then allowed to recover many times, rehabilitation becomes successively more rapid (“Savings”)
Ex: Friends’ dogs smell: when you initially visit, you might notice it strongly, but over repeated visits, the intensity of your perception diminishes until you hardly notice it at all. However, after a substantial absence, your sensitivity to the smell can return quickly, showing how your brain retains a sort of memory of the heightened response.
Frequency
The more rapid the frequency of stimulation, the more rapid and/or more pronounced is habituation
Ex: In a similar fashion, if you frequently hear a loud train passing by, at first, the noise might distract you, but over time, you'll become accustomed to it, and it fades into the background of your consciousness.
Overlearning (“Below-zero” habituation)
If the stimulus continues being presented after the response has already disappeared, it will result in slower recoveryHabituation of response to a given stimulus “Spreads” to a similar stimulus (Stimulus generalization)
Ex: After being cheated by a dishonest salesman at a local appliance store, Amy distrusts all salesmen.
Ex: Spicy Italian food can cause a strong reaction for some individuals, but regular exposure may lead to a reduced sensitivity to the heat of similar food, like Thai spicy food.
Why does Habituation Occur?
Prevents animals from wasting time and energy on behaviors that are not necessarily functional
Experience changes even innate behavior
Habituation/Dishabituation in Infant Research
Small babies show habituation
More attention to novel objects or scenes, but if presented repeatedly, they pay less and less attention.
Sensitization
The presentation of a stimulus would make the opposite process: Sensitization
Sensitization is an increase in responding due to repeated stimulation
The reason this happens is that the nervous system becomes more responsive to the stimulus over time, enhancing the intensity of the reaction.
Ex: exposure to a loud noise and then it getting louder over time can lead to heightened stress or anxiety in an individual.
Sensitization occurs when the organism becomes aroused for some reason. Especially, when stimulation is intense or noxious (I.e., “arousing”)
Sensitization VS Habituation
Fewer exposures are necessary to produce sensitization
Resulting memories can last much longer- for days or weeks
Unlike habituation, sensitization is not stimulus-specific
Exposure to a sensitizing stimulus can amplify the response to any stimulus that comes later.
Both of them can occur in any situation that involves repeated exposure to a stimulus
Ex: Driving, playing a musical instrument.
Habituation also determines how much we enjoy something.
Habituation also determines how much we enjoy something.
Opponent Process Theory
Exposure to any emotional stimulus creates (i) an initial emotional response, followed by (ii) an adaptation phase, and then (iii) an opposite after-reaction when the stimulus terminates.
With repeated exposure to the same emotional stimulus, the pattern changes.
A-process (initial emotion): Right before the gun goes off or you step on stage, your heart pounds, palms sweat, you feel anxious and tense.
B-process (opponent after-reaction): As soon as the race starts or you begin speaking, the anxiety drops and is replaced by relief and even excitement or confidence.
With repeated exposure (e.g., running several races or giving many talks):
The initial fear (A-process) gets weaker and shorter.
The relief/confidence afterward (B-process) gets stronger and lasts longer.
Another everyday one: roller coasters. First few rides you feel intense fear going up the first hill (A-process), followed by a strong rush of relief and thrill after the drop (B-process). With repeated rides, the fear shrinks while the thrill dominates.
Historical background
Ivan Sechenov attempted to explain the mind completely in terms of reflexes, in contrast to Descartes.
If a wallet drops from someone, how would reflexes explain the justification of the prevention of the urge to reach for it?
Ivan Pavlov: His work was built upon the ideas of Sechenov. First work on classical conditioning; however, he did not discover classical conditioning
Dogs salivating, Pavlov called this “Psychic reflex”
A tool for understanding the functioning of the brain. The brain is an integrative mechanism of excitation and inhibition
Classical Conditioning
This is a procedure, a phenomenon, and a process
As a procedure: Presenting two events in some temporal relationship to each other
As a phenomenon: Behaving differently in response to one event after experiencing it in combination with the other event.
The process underlying this type of learning is also called classical conditioning.
Basic Phenomena
Acquisition: Phase where a conditioned response (CR) is established.
CS (conditioned stimulus) is paired with UCS (unconditioned stimulus).
Stimulus Generalization: CR spreads to similar neutral stimuli.
More similarity → stronger generalization.
Stimulus Discrimination: CR occurs only to the original CS.
Learned by presenting similar stimuli without UCS.
Ex: Dog trained to sit at the sound of a bell will not respond to a whistle, demonstrating the principle of stimulus discrimination.
Extinction & Recovery
Extinction: CR weakens when CS is repeatedly presented without UCS.
Disinhibition: CR reappears if a new external stimulus is introduced during extinction.
Rapid Reacquisition: CR returns quickly when CS–UCS pairing resumes.
Spontaneous Recovery: CR can reappear after a pause in training.
Variations on the Basic Experiment
Second-Order Conditioning: Once a CS has been well conditioned, it can be used as a US
Sensory Preconditioning: Two neutral stimuli paired first. If one is later paired with a UCS, the other also triggers a CR.
Ex: Imagine you consistently experience two things together that, by themselves, don't trigger any particular reaction (like a specific song and a particular scent). Later, if you learn to associate one of those things (e.g., the scent) with something that does trigger a reaction (like a pleasant memory that makes you feel happy), then just experiencing the other original thing (the song) will also start to make you feel happy, even though the song was never directly linked to the pleasant memory.
Factors Affecting Conditioning
Remember:
US = An automatic natural trigger without prior experience.
NS = A neutral stimulus that, before conditioning, does not elicit any response. After being paired with an unconditioned stimulus (US), the neutral stimulus (NS) can become a conditioned stimulus (CS)
CS = a previously neutral stimulus that, after being paired with an unconditioned stimulus, elicits a conditioned response.
CR = a learned response to a conditioned stimulus that occurs after the conditioning process has taken place.
Stimulus Intensity: Stronger US/CS = stronger conditioning (up to a limit).
Timing (order & delay of CS and US):
Delayed: CS starts, then after some time, US starts (best method).
Trace: There is a gap between end of CS and the onset of US. (weaker if gap too long).
Simultaneous: CS and US occur together (weaker).
Backward: CS occurs after US (very weak).
Temporal Conditioning: Passage of time can also be a conditioned stimulus. Waking up on the weekend without an alarm at the same time you set your alarm during the week. The CS is completely gone in this.
The better the conditioning, the better the conditioning.
Ex: in fear conditioning: Delayed procedure → “freezing”
Simultaneous procedure → “escaping”
Much relying on the hippocampus (Memory)
Trial Spacing: Spaced trials > Massed trials.
Novelty: Conditioning is faster with unfamiliar CS/US.
Latent inhibition: prior exposure to CS interferes with conditioning.
US preexposure effect: prior exposure to US slows conditioning.
Theories of Conditioning
Stimulus-Substitution Theory Proposed by Pavlov: CS substitutes for US.
Problem 1: CR ≠ UR exactly (The dogs salvation compared to hearing the bell and seeing the food is similar, but the way it makes the saliva is different.)
Problem 2: The CR is almost never exactly like the UR
Problem 3: Not all parts of the UR become part of the
CR (Foot shocks cause rats to leap, but the CR to a tone paired with foot shock is freezing and immobility)
Preparatory-Response Theory: CR prepares organism for US.
Example: Salivation prepares for food, freezing prepares for shock.
Explains drug overdoses in new environments (lack of conditioned compensatory responses).
The CS comes to counteract the drug effect (Opponent-process theory)
Heroin → Decreased blood pressure (a-process) → Increased blood pressure (b-process)
Information Value in Conditioning
Rescorla (1968): Conditioning only occurs if CS provides information (predicts US).
Overshadowing: In compound CS, stronger stimulus dominates (e.g., bright light + weak sound → strong CR to light, weak CR to sound).
Blocking: Prior conditioning with one CS (noise) prevents conditioning with a new CS (light) if both are presented together with US.
Rescorla-Wagner Theory
Learning occurs when the US is surprising.
Conditioning decreases as CS predicts US more accurately.
More salient CSs condition faster.
Equation: ΔV = k(λ – V)
ΔV = change in learning, k = CS salience, λ = max conditioning, V = current prediction.
Applications of Classical Conditioning
Prejudice: Negative/positive feelings toward groups can be conditioned.
Emotions/Fears: Explains phobias (e.g., Little Albert).
Exposure Therapy: Uses extinction.
Flooding: Prolonged exposure to CS without US.
Sexual Behavior: Fetishes may form via conditioning (e.g., boot paired with arousal).
Eating Behavior: Environmental cues (cafeteria) trigger hunger; taste aversions (e.g., from chemotherapy).
Immune Function: Conditioning can enhance or suppress immune response (e.g., flavor paired with adrenaline).
Misapplications: Aversion therapy used in “conversion therapy” = ineffective and harmful.
Operant Conditioning – Ch 6 Notes
From Classical → Operant
Classical = Responds to the environment (adaptation) → mostly reflexes (Automatic: salivate, blink)
Operant = acting on environment → mostly voluntary (walk, talk, eat)
Many everyday behaviors are not elicited by a specific stimulus but rather voluntary behaviors
Voluntary means there is a choice; behavior depends on the consequence
Also known as instrumental conditioning.
Like classical conditioning, operant conditioning is both a procedure and a process
History
Hedonism: behavior guided by increasing pleasure ↑ / and avoiding pain ↓
Thorndike – Law of Effect:
Attributed the gradual improvement over trials to the progressive strengthening of an S-R connection
Caught Stray cats on the street if they have intelligence.
Response that leads to a positive outcome → more likely to repeat in the same situation
Response that leads to a negative outcome → more likely to not repeate in the same situation
Thorndike put the cats in a trap box. He observed that the cats learned to escape more quickly from the box after multiple trials, but the first time, the cats only got out by chance.
Skinner: “free-operant” research → built Operant (Skinner) Box to more effectively record activity.
Skinner box: a controlled environment used to study the behavior of animals through reinforcement; it allows for the observation of operant conditioning, where behaviors are learned through rewards or punishments.
3-Term Contingency Procedure
A → B → C
A = antecedent / cue (SD)
B = behavior
C = consequence
Discriminative Stimulus (SD): cue that signals which behavior will be reinforced/punished
Behavior never in a vacuum → context matters
Core Idea
Behavior frequency changes because of what follows it
↑ future behavior = reinforcement
↓ future behavior = punishment
A response will occur more frequently if a favorable consequence has followed it in the past: Reinforcement.
A response will occur less frequently if an unfavorable consequence has followed it in the past: Reinforcement.
Reinforcement
Reinforcer: anything that increases future behavior
Positive R: add pleasant stimulus → increase in B
ex: Wi-Fi gets better when you move phone → keep moving phone
Negative R: remove unpleasant stimulus → increase in B
ex: sunglasses remove glare → wear sunglasses more
Punishment
Punisher: anything that decreases future behavior
Positive P: add aversive stimulus → decrease in B
ex: joke → friends angry → stop making that joke
Negative P: remove pleasant stimulus → decrease in B
ex: license suspended for texting → no more driving
Reinforcers – What Matters
Is it Positive or Negative reinforcement
What is the behavior
What happened right after the behavior? Was a stimulus added or removed?
What happened with the behavior after the consequence?
Primary: biologically important (food, warmth, mate)
Secondary: learned signals of primary (money, praise, clicker)
Effectiveness factors:
Deprivation / motivation: hungry → food works better
Quality & magnitude: higher-quality or larger → ↑ effect (but many small > few big)
Behavioral contrast: shift from rich→poor reward ↓ motivation; poor→rich ↑ motivation
Immediacy: quicker consequence → stronger learning
SD: Discriminative stimulus is a signal from the environment that indicates a particular behavior is likely to result in a specific consequence or reinforcement
ex: phones & nicotine → instant payoff = addictive
Behavioral Contingency
Three-term contingency: behavior occurs in response to an antecedent, which produces a consequence.
Traffic lights and signs
Greetings
Appropriate and inappropriate jokes
Lectures
Using Reinforcement to Teach
Shaping and chaining: We can use what we know about reinforcement to instigate behavior and teach.
Shaping:
Shaping & Chaining
Shaping: reinforce successive approximations toward final behavior
used for speech, contact-lens insertion\
If we want to teach a dog to roll over, we would first reward it for lying down, then reward it for rolling onto its side, and finally reinforce the complete action of rolling over.
Shaping is effective teaching method in many cases
Shaping is effective teaching method in many cases
Teaching people to speak, insert contact lenses, etc.
In animal training, typically a “clicker” is used as SD
Prompting is using verbal or physical encouragement to get a behavior to occur)
Prompts: verbal / gesture / model / physical or putting-through
Chaining: Taught by breaking a task into steps (“task analysis”) and then “chaining”
Forward: train step 1 → step 2 → … Trained in order
Backward: The last step of a sequence is trained first, and each preceding step is added until the entire sequence is performed.
Ex: Violin Music: First mastering the individual measures, then progressing to more measures, and finally integrating all measures.
Factors that affect the Strength of Operant Conditions
“Reinforcer and “reward” are not the same thing
Rewards can be ineffective. If a reward is ineffective, it was not a reinforcer
Primary and Secondary Reinforcers:
Primary: biologically important (food, warmth, mate)
Secondary: learned signals of primary (money, praise, clicker)
Effectiveness factors:
Deprivation / motivation: hungry → food feels better
Quality & magnitude: higher-quality or larger → ↑ effect (but many small > few big)
Behavioral contrast: shift from rich→poor reward ↓ motivation; poor→rich ↑ motivation
Immediacy: quicker consequence → stronger learning
SD: Discriminative stimulus is a signal from the environment that indicates a particular behavior is likely to result in a specific consequence or reinforcement
Effectiveness of Reinforcement:
The effectiveness of a reinforcer depends but also on what the subject received previously
Negative Contrast: An individual experiences a decrease in motivation to obtain a reward due to a previously encountered more favorable outcome. (Large → Small)
Positive Contrast: An individual experiences an increase in motivation to obtain a reward due to a previously encountered less favorable outcome. (Small → Large)
Immediacy is the number one Reinforcer
The sooner a behavior is followed by the consequence, the more effective the consequence is.
- So, why are smartphones so addictive?
- Why are cigarettes more addictive than marijuana?
Intrinsic vs Extrinsic Reinforcement
Intrinsic: reward built-in to behavior (read textbook because you enjoy learning)
Extrinsic: external reward (read textbook for to get an A on exam)
Rewards can undermine intrinsic motivation if:
Reward is expected (Person has been instructed beforehand that she will receive a reward)
Reward is tangible (When it consists of money rather than praise)
Reward is given for just doing, not for how well (Participant trophies)
Superstitious Behaviors
Immediate reinforcers can accidentally strengthen irrelevant behaviors (Skinner 1948)
Big Picture
Operant = learning guided by consequences
2 main processes: Reinforcement ↑ B | Punishment ↓ B
Each can be positive (+ add) or negative (– remove)
Strength depends on timing, quality, context, motivation, cues
Operant Conditioning Schedule
Reinforcement Basics
Reinforcement: any consequence that increases the likelihood of a behavior.
Punishment: any consequence that decreases a behavior.
A reward doesn’t always function as a reinforcer — effectiveness depends on:
Deprivation (how much the person wants it)
Magnitude (amount/size)
Quality
Contrast (compared to prior rewards)
Immediacy (how fast it follows the behavior)
Most real-world behaviors are not reinforced every time — they occur under intermittent or partial reinforcement.
Definition of a Reinforcement Schedule
A schedule of reinforcement describes when and how often reinforcement occurs.
It specifies:
How many times the behavior must occur,
When reinforcement is available,
The pattern or rule connecting the behavior to its consequence.
Different schedules lead to different rates and patterns of behavior and different resistance to extinction.
Continuous vs. Partial Reinforcement
Continuous Reinforcement (CRF)
Behavior is reinforced every single time it happens.
Produces rapid learning (fast acquisition).
But behavior extinguishes quickly when reinforcement stops — low persistence.
Partial (Intermittent) Reinforcement
Behavior is reinforced sometimes, not every time.
Leads to slower learning but greater resistance to extinction.
This is what happens in most natural behaviors.
Major Schedules of Reinforcement
A. Ratio Schedules (Based on Response Count)
Fixed Ratio (FR)
Reinforcement after a set number of responses.
Example: A worker gets paid after making 10 products → FR-10.Produces:
High, steady response rate.
Post-reinforcement pause after each reward (the “break and run” pattern).
Ratio strain occurs when the requirement becomes too high, causing the behavior to stop (burnout).
Common in production work, video game tasks (“extra life after 20 kills”).
Variable Ratio (VR)
Reinforcement after an unpredictable number of responses (average number is known).
Example: Slot machines, lottery, or “likes” on social media.Produces:
Highest, steadiest response rate.
Little or no pause after reinforcement.
Most resistant to extinction.
The unpredictability keeps behavior persistent — this is why gambling and social media are so addictive.
B. Interval Schedules (Based on Time)
Fixed Interval (FI)
Reinforcement for the first response after a fixed amount of time has passed.
Example: Checking the mail when you know it’s delivered at 3 PM.Produces:
A scalloped pattern — slow responding after reinforcement, then increases as the interval ends.
Behavior peaks just before the expected reinforcement time.
Predictable reward timing = slower start, faster finish.
Variable Interval (VI)
Reinforcement for the first response after an unpredictable amount of time has passed.
Example: Checking email or fishing — you never know when a reward will occur.Produces:
Moderate, steady responding.
No post-reinforcement pause.
Very common in real-world settings (e.g., attention from others, waiting for texts, unpredictable feedback).
Other Important Schedules
Differential Reinforcement of High Rates (DRH)
Reinforcer is given only if the individual performs a certain number of responses within a specific time.
Encourages faster responding.
Example: Bonus only if you finish 20 tasks in an hour.
Differential Reinforcement of Low Rates (DRL)
Reinforcer given only if responses are spaced apart by a minimum time.
Encourages slower responding.
Example: Teacher rewards a child for raising their hand no more than once every 5 minutes.
Progressive Ratio (PR)
The required number of responses increases after each reinforcement.
Example: A teacher might start with a reinforcement after every 2 responses, then increase to 4 responses, and continue to adjust the ratio based on the child's performance.
Used to measure motivation — the point at which the person stops responding is called the breakpoint.
Theories of Reinforcement
Drive Reduction Theory (Hull)
Behavior is reinforced when it reduces a biological drive (hunger, thirst, etc.).
Works for primary reinforcers (food, water), but fails to explain behaviors that don’t satisfy biological needs (e.g., reading, playing games).
Optimal Arousal Theory
People and animals are motivated to maintain an optimal level of stimulation.
Some reinforcers increase arousal rather than reduce it fun, novelty, challenge.
The Premack Principle (1965)
Also known as the “Grandma’s Rule”: “You can do the fun thing after the less fun thing.”
Definition: A high-probability behavior can reinforce a low-probability behavior.
Example: “You can play video games after finishing homework.”
Reinforcement isn’t about objects or stimuli — it’s about access to a more preferred activity.
Works both ways:
Reinforcement: low → high (homework → video games)
Punishment: high → low (child frequently watches TV but fails to clean their room → removing access to TV until the room is cleaned)
A reinforcer is any activity that has a greater probability of occurring than that of the reinforced behavior.
The Premack principle states that reinforcement occurs when a behavior allows you access to a more preferred behavior.
Money is not a reinforcer. USING THAT MONEY is a reinforcer.
How to use this in practice:
Observe to identify which behaviors are more likely to occur without any restraints
Identify high-frequency behaviors
Use these as reinforcers
Use this for a behavioral modification project
Problem with the Premack principle section: A behavior becomes reinforcing when one is prevented from engaging in it at its normal frequency.
Behavioral Regulation / Response-Deprivation Theory
Developed by Timberlake & Allison.
Focuses on how behaviors are distributed and restricted.
Any behavior has a baseline level (how often it naturally occurs).
If access to a behavior is below baseline, the chance to engage in it will be reinforcing
Example: Talking on the phone may become rewarding if you’ve been deprived of it for days.
Example: If a couple typically expresses affection through daily hugs but has been separated for a week, the first hug upon reuniting becomes highly reinforcing
Deprivation, not preference, determines reinforcement strength
Access even to a less-preferred behavior can be reinforcing if its baseline level has been denied or deprived
Even something as annoying like talking on the phone for hours with your mom, might be really motivating if you’re not allowed to do it.
Modern Real-World Applications
Video Games: Often use VR schedules (loot boxes, random drops).
Social Media: Likes, matches, or notifications use variable reinforcement — keeps users checking.
Work & Sports: Fixed and variable schedules motivate output (commissions, milestones, timed feedback).
Behavior Therapy: Differential schedules (DRH/DRL) help shape response speed or frequency.
Quick Comparison Chart
Schedule Type | Rule | Pattern of Response | Example | Resistance to Extinction |
|---|---|---|---|---|
FR | After a fixed number of responses | High rate + pause | Factory work, “Buy 10 get 1 free” | Moderate |
VR | After a variable number of responses | Very high, steady | Gambling, social media | Very high |
FI | First response after fixed time | “Scalloped” (slow → fast) | Checking for hourly bus | Moderate |
VI | First response after variable time | Steady, moderate | Fishing, waiting for text | High |
Common Test Questions & Answers
Which schedule produces the highest steady rate? → Variable Ratio (VR).
Which schedule produces a scalloped response pattern? → Fixed Interval (FI).
Which is most resistant to extinction? → Variable Ratio (VR).
What happens if FR requirement is too high? → Ratio strain (behavior stops).
Explain Premack vs. Response Deprivation:
Premack: high-probability behavior reinforces low-probability behavior.
Response Deprivation: restriction below baseline creates reinforcement.
Why doesn’t money always reinforce? → Its power depends on what it lets you do (Premack principle).
BMOD Project
Select particpant
Identify the target behavior and assessing it
conduct intervention
Determine if it worked
Assessment is the most important component of any intervention program
In the context of a BMOD (Behavioral Modification) project, which aims to change a specific behavior, a school psychologist would often use an ABC assessment to understand and address a student's behavior.
The ABC assessment stands for:
A - Antecedent: What happened before the behavior? (The cue or trigger, similar to a discriminative stimulus (SD) in operant conditioning, as noted in your material: A = antecedent / cue (SD)).
B - Behavior: The specific action the student displayed. (The target behavior identified in your BMOD project notes).
C - Consequence: What happened after the behavior? (What followed the behavior, which either increases or decreases its future occurrence, referred to as reinforcement or punishment in your notes: C = consequence).
Main objective: To alter or develop a particular behavior
Target behavior
Socially significant behaviors that have immediate and long-lasting effects for the person and for those who interact with that person
Increase positive, prosocial behavior
Decrease undesirable or inappropriate behavior
Direct Observations
Direct and repeated observations of the target behavior in the natural enviornment
Assess and select target behavior’s
For example, the ABC analysis or:
Topography: Specific movements
Frequency or rate: # of instances per period of time
Duration: Length of time from beginning to end of a behavior
Intensity (magnitude)
Stimulus control: Correlation between behavior and environmental episodes
Latency (reaction time): Time between occurrence of an event or cue and the start of a behavior
Quality of behavior
Assessing Behavior
Different methods of systematic observation can be used depending on how often and how fast behaviors are occuring, the extend to which the observer can record the occurrences, and what questions need to be answered
Two general methods: Event-based or time-based
Event-based observation:
- Frequency
- Rate
- Duration
- Latency
Make a table and graph!
Time-based techniques:
Used when event-based systems are difficult to conduct
Data are recording during prespecified intervals of time within a specified observation session and then are summarized into percentage of intervals.
Sampling behavior
Momentary time sampling
Whole-interval recording
Partial-interval recording
Escape, Avoidance and Punishment Unit 3
Aversive Stimulation
We can only guess ahead of time what will be aversive.
What appears painful to one person may be pleasurable to the next
There might be some universally aversive stimuli
Ex: loud noises, extreme temperatures, and certain unpleasant odors are often cited as examples that elicit negative reactions across diverse populations.
With negative reinforcement, a behavior increases if some stimulus is removed after the behavior occurs
Another type of negative reinforcement is avoidance: A response prevents an unpleasant stimulus from occuring in the first place (Auto-pay for bills)
This can be very powerful and very difficult to extinguish (Anxiety Conservation Hypothesis)
Aversive Stimulation is rarely done now because it is highly ethical
The Shuttle box avoidance procedure
Subjects learn to prevent an aversive stimulus, such as an electric shock, by moving to a safe area. (Highly ethical because you have to hurt the person/animal to condition them to avert the stimulus)
However, after a few trials, something happens: the rat starts to ANTICIPATE the electric shock, so it preemptively moves to the safe area before the shock is administered.
This is actually kinda better than reinforcement since for reinforcement (Like giving a dog a treat after doing a good behavior) will continuesly get extinguished over time if you stop that reinforcement, as this method of avoidence since they already anticipate the electric shock and move to the safe area, they wont know if you stop that electric shock, so they keep on doing it unless you forcefully make them stay in that spot with no electric shock.
Two cases of negative reinforcement
Escape: When behavior terminates aversive stimulus
Ex: When a person takes medication to relieve pain, the act of taking the medication allows them to escape the discomfort they are feeling.Avoidance: When behavior prevents aversive stimulus
Ex: When a person avoids a situation, such as social gatherings, to prevent the anxiety that may arise from them.
The avoidance paradox: How can the absence of something provide reinforcement?
Explaining Avoidance
The two-factor theory of avoidance: A combination of Classical and Operant conditioning
Classical conditioning:
Warning Stimulus (CS) is paired with the aversive event (US) (Light → Shock)
So the CS comes to elicit fear
Operant conditioning:
Escaping is reinforced by removing the fear-evoking CS (Light)
So no avoidance, actually: the dog escapes the shock, then it escapes the light
Problems with two-factor theory avoidance
An increase in fear when the signal for shock is presented should occur, but that’s not the case
Avoidance responses are often very slow to extinguish
Problem: Fear always increases when the warning signal appears, but it doesn’t
The one-factor theory of avoidance
The classical conditioning component is not necessary: Avoidance can occur without the aid of warning stimuli
Avoidance of a shock can in itself serve as a reinforcer
The procedure is called nondiscriminated or free-operant avoidance, made by Sidman
The aversive stimulus (e.g, shock) is scheduled to occur periodically without warning (e.g, 10 seconds)
Every time there is an avoidance response, there is a period of safety during which shocks do not occur
Early avoidance responses make it difficult to extinguish phobia
Individuals fail to make contact with the extinction procedure, which leads to the development of “flooding”
It prevents contact with extinction.
Punishment:
Weakening of a behavior through an aversive stimulus or the removal of an appetitive stimulus
Punishment is often thought as equal to physical punishment to other people
What defines punishment is the effects on behavior
Punishment can be highly effective, but misapplied, it might not suppress behavior, behavior may recover, and might have unintended collateral effects
Types of Punishers:
Many types of aversive stimuli are used:
Positive punishment examples are electric shock, bursts of air, loud noise, verbal reprimands, and a physical slap
These are positive punishments because they introduce an undesirable stimulus that aims to decrease the likelihood of a behavior occurring in the future.- Overcorrection: Requiring a person to rectify what was done badly (Restitution) AND to overcorrect the mistake by performing other behaviors (E.g., repeatedly practicing the appropriate behavior)
Negative punishment includes response cost and timeout, among others
Response cost: Removal of a specified amount of reinforcer immediately following a behavior
Time out: removal of the opportunity to obtain positive reinforcement
Factors Influencing the Effectiveness of Punishment
Intensity
Must be intense at the beginning, and the longer shocks are more effective in punishing responding.
If it is not intense, then it may lead to habituation.
It’s tricky to determine what is more intense with some stimuli (e.g., louder reprimands)
Longer time outs are not more effective
Immediacy
Just like with reinforcement, punishment needs to be immediate to be effective
For most criminals, rewards are received immediately, but punishment is delayed
Schedule of Punishment
Punishers need the most powerful way to reduce behavior is to punish every occurrence
Response patterns are often the opposite of those obtained with positive reinforcement
E.g., in FI schedule of punishment, a declaration occurs as the next punisher approaches
FR schedules of punishment produce a response then-pause pattern
Consistency
Punishment can fail if delivered inconsistently and if we only give one type of punishment.
Threat of punishment is not effective at all if the behavior is highly motivated (e.g, stealing food if children are starving)
Reinforcement of Alternative Behaviors
Punishment is much more effective when there are alternative ways to obtain a reinforcer.
Problems with the Use of Punishment
Punishment of maladaptive behavior doesn’t strengthen the occurrence of adaptive behavior. It doesn’t teach you what to do instead.
Punishing one behavior can result in generalized suppression of other behaviors
The person delivering the punishment could become SD to punishment; unwanted behavior is selectively suppressed when that person is present
Punishment may teach an individual to avoid the person who delivered the punishment (or the situation where it happened)
Punishment is likely to elicit strong emotional responses
Punishment can elicit aggressive reaction
Punishment, through modeling, could teach a person that punishment is an acceptable means of controlling behavior
Punishment often has immediate effect stopping unwated behavior has immediate effect stopping unwanted behavior; use of punishment is often strongly reinforced (i.e., person is more likely to punish in the future
Benefits and Effective Use of Punishment:
Mild punishment can be useful (e.g., reprimands, response cost, timeout)
Can produce rapid change
Can lead to increase in social behavior
Sometimes results in mood imporvement
Can increase attention to environment
Generally used as a last-measure resort
Should only be used when behavior is very maladaptive and rapid change is in the best interest of the person.
Never as retribution or “justive” and never as a deterrent to others.
Should be carefully planned, reviewed, and supervised, and always administered in calm.
Shoukd be immediate
Should be consistent
Should be intense
Negative punishment generally preferable to positive punishment
Should be accompanied by explanation
Should be combined with positive reinforcement for appropriate behavior
Learned Helplessness
Decrement in learning ability resulting from repeated exposure to uncontrollable aversive events
Related to depression after series of uncontrollable aversive events (e.g., women who have been the victims of domestic violence)
Cure? People will eventually recover ability to escape if repeatedly forced to escape the aversive stimulus
Prevention? History of successfully overcoming minor adversities protects against depression when later confronted by more serious difficulties.
Masserman’s Experimental Neurosis
Cats trained to open a food container when a cue was presented.
Some were punished the moment they opened the lid to feed
Persisted over many months, even if punishment discontinued
If cats were initially trained to press a switch to activate the signal of food, symptoms were milder and returned faster to normal
Self Control
If self-control comes from within, what can be done to improve it in those who have none?
Baumeister: Ego depletion:
Self-control is bound by a limited resource that gets depleted with use (glucose). So, after exerting self control people have greater difficulty resisting subsequent demands, so it’s kind of like a muscle!
However, this theory is facing a lot of backlash.
Skinnerian Self Control:
Many behavioral methods to improve self-control
Controlling responses decreases the likelihood of controlled responses, like restraint, changing motivation, and distraction.
Mischel (1966) found children varied considerably in their ability to resist temptation
Years later, children who had more self-control were “better adults”
The Ainslie-Rachlin alternative:
The value of a reinforces tends to decrease with delays between choice and delivery of the reinforcer
So, in a two-option situation, choice is determined by the relative value of Magnitiude (Small/Large) of the reinforcer
And Delay (Sooner/Later) of the reinforcer
Larger reinforces have higer value, but immediate reinforcers do also.
So, the value of a reinforcer is “discounted” as a function of its delay, this is called Delay Discounting
Delay Discounting:
The decrease in the present value of a reward based on the time it takes to receive it; individuals tend to prefer smaller, immediate rewards over larger, delayed rewards.
Self control issues arise when we have a conflict: Choosing between “Smaller-sooner” (SS) and “Larger-later” (LL) outcomes
As reward becomes imminent, its value increases more and more sharply, yielding a “delay curve” or a delay function
Preference Reversals:
Preference for SS rewards vs LL rewards changes over time
Where smaller rewards precede larger alternatives, people prefer larger reward when BOTH are distant. (Time 2)
But change to smaller as it becomes immediately available. (Time 1)
Steeper delay discounting has been linked to risky sexual behavior, poor financial decisions, opiate addicts, gambling adddictions.
Factors of delay discounting:
Age is a big factor
Innate differences in impulsivity across species.
Individual differences in impulsivity within species.
You can arrange to make the choice long before the reinforcers are actually due (precommitment)
Pre-commitment: Making decisions in advance to limit future choices, thereby reducing the temptation associated with immediate rewards that may compromise long-term goals.
Increase self-control-
Setting up explicit sub goals
Increasing the value of the delayed alternative or decreasing the value of the immediate alternative
Social Learning/Observational Learning
Learning that is facilitated by observation of, or interaction with, another animal (typically conspecific) or its products
Humans are one of the only species that can learn through observations
That begs the question: Can other Animals Learn by observation?
Probably.
However, they generally learn things that they could learn on their own
But do other species “teach”
Examples of individuals who seem to be learning from others in their social group are numerous across species.
Social Facilitation: Individuals are more likely to perform a behavior when in the company of others performing it (Contagious)
Most contagious behaviors are related to survival and/or group dynamics
Stimulus Enhancement: The probability of a behavior being changed because of a person’s attention is drawn to a particular item or location by the behavior of another individual
Emulation: Copying only elements of a complex action
True Imitation: Close duplication of a behavior/sequence of behaviors by virtue of having seen the action performed. Children have a tendency to imitate the behaviors of those around them.
Generalized Imitation: Tendency to imitate new modeled behavior with no specific reinforcement for doing so
Socially Mediated Learning: Requires an observer or actor and a demonstrator, who performs the behavior later reproduced in whole or part by the observer
Vicarious Emotional Responses: Classically conditioned emotional responses from seeing those emotional responses exhibited by others. Seeing other people afraid of things can make you afraid of them.
Remember though: We copy models who are similar to us through observation
Bandura’s Social Learning Theory
People learn new behavior primarily through copying others
Remember the Bobo Doll experiments. Where children exposed to aggressive models were more likely to be aggressive toward the doll.
Criticial distinction between acquisition and performance
Acquisition depends on
Whether attention is paid
Consequences of the model’s behavior
Whether observer received reinforcement for paying attention
Rule-Governed Behavior
Behavior controlled by verbal instructions instead of direct experience.
Effects of Instructions
Humans learn faster with instructions.
Humans may persist in incorrect behavior if given inaccurate rules (even when contingencies change).
Contingency-Shaped vs Rule-Governed
Contingency-shaped = Learned through direct experience (like animals).
Rule-governed = Learned through verbal rules (instructions, self-rules).
Problems with Rules:
Can interfere with adapting to real feedback (people stick to rules even when they’re wrong).
People with no instructions generate their own rules, which might still be wrong.
Verbal Effects in Pavlovian Conditioning
Verbal instructions can change conditioned responses unless the stimulus is biologically important (e.g., snakes/spiders overcome instructions).
Effect on Operant Schedules
Verbal learners are less sensitive to reinforcement schedules.
Only preverbal infants behave like non-human animals on schedules.
Rule-Governed Behavior + Observational Learning
Both save time and allow complex cultural learning.
Adults teach intentionally; children trust teaching more than personal experience.
Leads to self-regulation and “thinking like adults.”
Ratchet Effect
Cultural progress builds cumulatively because each generation imitates + improves on previous knowledge.
Core Takeaways
Concept | Key Idea |
|---|---|
Social Learning | Learning by watching others |
Contagious Behavior | Reflexively triggered actions |
Stimulus Enhancement | Attention drawn to objects |
True Imitation | Exact copying of novel behavior |
Generalized Imitation | Imitation becomes habitual |
Vicarious Conditioning | Learn emotions/reinforcement by watching |
Teaching | Intentional instruction, rare in animals |
Rule-Governed Behavior | Verbal rules can override experience |
Ratchet Effect | Cultural advancement through imitation + rules |