Learning: Classical and Operant Conditioning, and Modeling

What is Learning?

Learning is the process that leads to a relatively permanent change in behavior or potential behavior.
It alters how we perceive our environment and interpret incoming stimuli, thus changing our interactions and behaviors.

Models of Learning

Associative Learning: Learning that certain events occur together.
- Classical Conditioning: Two stimuli are associated.
- Operant Conditioning: A response and its consequences are associated.
Stimulus: Any event or situation that evokes a response.
Cognitive Learning: Acquiring mental information through:
- Observing events.
- Watching others.
- Language.

Classical Conditioning

Definition: A type of conditioning where a stimulus gains the capacity to evoke a response initially evoked by another stimulus.
Pioneered by Ivan Pavlov, who conditioned dogs to salivate at the sound of a tone.
Mainly regulates involuntary, reflexive responses.
Examples:
- Emotional responses (e.g., fears).
- Physiological responses.

Ivan Pavlov (1849 – 1936)

Russian physiologist.
Won the Nobel Prize in 1904 for his research on the physiology of digestion.
Discovered classical conditioning while studying salivary secretions in dogs.
Observed that dogs started to salivate before food was placed in their mouths, such as when an assistant approached.
Pavlov realized that the brain was forming a new connection where the sight of the assistant became associated with the arrival of food.
Pavlov began experimenting with neutral stimuli like metronomes or bells, pairing them with food to study the formation of new connections.

Components of Classical Conditioning

Elicited Behavior: Behavior that occurs in response to an environmental event.
Unconditioned Stimulus (UCS): A stimulus that naturally elicits a response.
Unconditioned Response (UCR): The natural response to the UCS.
Neutral Stimulus (NS): A stimulus that initially elicits no response.
Conditioned Stimulus (CS): The neutral stimulus after being paired with the UCS, now elicits a response.
Conditioned Response (CR): The learned response to the CS.
Classical conditioning is learning to link two or more stimuli and anticipate events.

Example:

Before Conditioning:
- Food (UCS) leads to Salivation (UCR).
- Bell (Neutral Stimulus) leads to No Conditioned Response.
During Conditioning:
- Bell + Food leads to Salivation (UCR).
After Conditioning:
- Bell (CS) leads to Salivation (CR).

Example: Phobias

Before Conditioning:
- Loud Noise (UCS) leads to Fear (UCR).
- Rat (Neutral Stimulus) leads to No Fear.
After Conditioning:
- Rat (CS) leads to Fear (CR).

Equations:

$UCS \rightarrow UCR$
$NS + UCS \rightarrow UCR$
$CS \rightarrow CR$

Higher-Order Conditioning

Definition: A procedure where the conditioned stimulus in one conditioning experience is paired with a new neutral stimulus, creating a second, often weaker, conditioned stimulus.
Example: An animal learns that a tone predicts food, then learns that a light predicts the tone and begins responding to the light alone (also called second-order conditioning).
When a second neutral stimulus is paired with a CS instead of being paired with the original US, higher-order conditioning occurs.

Process:

Conditioning: $NS + US \rightarrow UR, CS \rightarrow CR$
Higher Order Conditioning: $NS + CS \rightarrow CR, CS \rightarrow CR$

Extinction and Spontaneous Recovery

Extinction: The diminishing of a conditioned response.
- In classical conditioning, it occurs when an unconditioned stimulus (US) does not follow a conditioned stimulus (CS).
- In operant conditioning, it occurs when a response is no longer reinforced.
Spontaneous Recovery: The reappearance, after a pause, of an extinguished conditioned response.
- It is the reappearance of an extinguished response after a period of non-exposure to the CS.

Generalization and Discrimination

Generalization: The tendency, once a response has been conditioned, for stimuli similar to the conditioned stimulus to elicit similar responses.
- It occurs when a CR is elicited by a new stimulus that resembles the original CS.
Discrimination: In classical conditioning, the learned ability to distinguish between a conditioned stimulus and stimuli that do not signal an unconditioned stimulus.

Examples:

Generalization: Ivan Pavlov conditioned dogs to drool when rubbed; they then also drooled when scratched.
Discrimination: Ivan Pavlov conditioned dogs to drool at bells of a certain pitch; slightly different pitches did not trigger drooling.
Stimulus Generalization: The tendency to respond to a stimulus that is only similar to the original conditioned stimulus with the conditioned response.
- Example: Feeling anxiety at the sound of a dentist drill and then feeling anxiety at the sound of a similar-sounding machine.
Stimulus Discrimination: The tendency to stop making a generalized response to a stimulus that is similar to the original CS because the similar stimulus is never paired with the UCS.
- Example: A coffee grinder causes anxiety because it sounds like a dentist drill but stops causing anxiety after a few uses.

Pavlov's Argument

Based on objective and observable data.
Stimulus-response is observable by everyone.
Overt behavior with high replicability.

Little Albert Experiment

This section is a placeholder for detailed notes on the Little Albert experiment.

Examples of Classical Conditioning

Acquisition of fear:
- Fear of dogs.
- Fear of presentations.
Prejudice: Towards people of a specific group.
Advertisements:
- Brand endorsements.
- Positive emotions associated with products.

Operant Conditioning

Definition: A type of learning in which responses come to be controlled by their consequences.
E. L. Thorndike’s work on Instrumental learning and the law of effect provided the foundation for the study of operant conditioning.
Pioneered by B. F. Skinner, who showed that rats and pigeons tend to repeat responses that are followed by favorable outcomes.
Mainly regulates voluntary, spontaneous responses such as studying, going to work, telling jokes, and asking someone out.

Edward Thorndike - Law of Effect

Any behavior that is followed by pleasant consequences is likely to be repeated.
Any behavior followed by unpleasant consequences is likely to be stopped.
According to Skinner, psychology is about behavior, not about the mind or the nervous system.
It deals only with variables that can be directly observed.
He, like Thorndike, quickly became convinced of the great power that reward and reinforcement can exert on behavior.

B. F. Skinner - Skinner Box

Skinner invented the Skinner box (operant chamber).
It included a response mechanism (lever for rats, disk for pigeons) and a means of delivering reinforcement (food, water).
The animal was free to respond again after the reward was delivered.
Skinner found that the events following a response greatly influenced its subsequent rate of occurrence.
If food was presented to a hungry rat after it had pressed a lever, the rate of lever pressing would increase. This is called Operant Conditioning.

Key Concepts

In Operant Conditioning, if a response (the operant) is followed by a reinforcing stimulus, the response strength is increased.
- Lever pressing is an operant.
- Food for a hungry rat is a reinforcing stimulus.
If an animal is reinforced for lever pressing only if a light is on and is never reinforced if it is off, then the animal will come to press at a much higher rate when the light is on than when it is off. This is discrimination.

Definitions:

Stimulus Discrimination: Learned response to a specific stimulus but not to other stimuli.
Stimulus Generalization: Learned response to similar stimuli.
Reinforcers: Responses from the environment that increase the probability of a behavior being repeated. Reinforcers can be either positive or negative.
Punishers: Responses from the environment that decrease the likelihood of a behavior being repeated. Punishment weakens behavior.

Types of Reinforcers:

Primary Reinforcers: Stimuli that directly satisfy our biological needs (e.g., food, water).
Secondary Reinforcers: Stimuli that are reinforcing through their association with a primary reinforcer (e.g., money).

Acquisition and Extinction

Acquisition occurs when a response gradually increases due to negative or positive reinforcement.
- Acquisition may involve shaping, which is the procedure that involves reinforcing behaviors closer to the target behavior.
Extinction occurs when responding gradually slows and stops after reinforcement is terminated.
Resistance to extinction occurs when an organism continues to make a response after reinforcement is terminated.
Aversive Stimulus: Painful or unpleasant stimulus that decreases our behavior.

Reinforcement and Punishment Types

Positive Reinforcement: Adding something desirable to increase behavior.
Negative Reinforcement: Removing something undesirable to increase behavior.
Positive Punishment: Adding something undesirable to decrease behavior.
Negative Punishment: Removing something desirable to decrease behavior.

Examples

Positive Reinforcement: Student gets praise for a correct answer.
Negative Reinforcement: Student can forgo wearing a school uniform for good behavior.
Positive Punishment: User is presented with a loud buzz and the feedback: "You lose - game over."
Negative Punishment: Students excluded from an activity for breaking a rule.
A+ Consequence adds something; future likelihood of behaviour decreases
Won't try again.

Equations

Attempted the half-pipe
Impact:
Impact:
Negative Reinforcement:
Consequence removes something; future likelihood of behaviour increases
Will study for the next test
Negative Punishment:
Consequence removes something; future likelihood of behaviour increases
Behaviour:
Driving recklessly
Behaviour:
Took out the trash
Realized you're too old for
this!
Studied for testConsequence: Got an A+
Impact:
Likely to drive recklessly in future
Consequence: License taken away
Will continue to discard smelly trash
Andrew Davis B.Sc.H | M.ADS @amldavis

Key Definitions:

Positive Reinforcement: Presenting something the organism likes; behavior is strengthened.
Negative Reinforcement: Removing something the organism doesn't like; behavior is strengthened.
Punishment: Presenting something the organism doesn't like; behavior is weakened.

Schedules of Reinforcement

The schedule of reinforcement for a particular behavior specifies whether every response is followed by reinforcement or whether only some responses are followed by reinforcement.

Types of Reinforcement Schedules:

Continuous Schedule of Reinforcement.
Intermittent Schedule of Reinforcement.

Definitions:

Continuous Schedule of Reinforcement (CRF): Reinforcement is delivered after every single target behavior.
Intermittent Schedule of Reinforcement (INT): Reinforcement is delivered after some behaviors or responses but never after each one.

Types of Intermittent Schedules of Reinforcement:

Fixed-Ratio (FR) Schedule.
Fixed Interval (FI) Schedule.
Variable-Ratio (VR) Schedule.
Variable-Interval (VI) Schedule.

Definitions of Intermittent Schedules:

Fixed-Ratio (FR) Schedule: Reinforcement is delivered after a fixed number of responses.
Fixed Interval (FI) Schedule: Reinforcement is delivered after a specific time period.
Variable-Ratio (VR) Schedule: Reinforcement is delivered after a varying number of responses.
Variable-Interval (VI) Schedule: Reinforcement is delivered after a varying period of time.

Schedule Descriptions and Examples Table:

Schedule	Description	Organizational Example
Continuous	Reinforcer follows every response	Praise after every new sale and order
Fixed interval	Response after a specific time period is reinforced	Weekly, bimonthly, monthly paycheck
Variable interval	Response after a varying period of time (an average) is reinforced	Transfers, unexpected bonuses, promotions, recognition
Fixed ratio	A fixed number of responses must occur before reinforcement	Piece rate, commission on units sold
Variable ratio	A varying number (average) of responses must occur before reinforcement	Random checks for quality work, praise for doing good work

Differences between Classical and Operant Conditioning:

Feature	Classical Conditioning	Operant Conditioning
Responses	Elicited (reactive)	Emitted (proactive)
Stimulus	Fixed to stimulus (no choice)	Variable in types and degrees (choice)
Conditioned Stimulus (CS)	Sound, object, person	Situation such as office, social setting, specific circumstances
Conditioning	Implemented before response	Implemented after response
Process	First, a stimulus is produced, and then the desired behavior is expected	First, we get a behavior pattern, and then we reinforce that behavior by reward or avoidance of punishment

Modeling / Observational Learning

Definition: Observational learning, also called social learning theory, occurs when an observer’s behavior changes after viewing the behavior of a model.
An observer’s behavior can be affected by the positive or negative consequences—called vicarious reinforcement or vicarious punishment—of a model’s behavior.
Observational Learning (also known as vicarious learning or social learning) is learning that occurs as a function of observing, retaining, and replicating behavior observed in others. – Albert Bandura.
Observational learning describes the process of learning through watching others, retaining the information, and then later replicating the behaviors that were observed.

Bobo Doll Experiment

72 children.
- 24 in Aggressive role model group.
- 24 in Non-aggressive role model group.
- 24 in Control group (no model).

Stage 1: Modelling

A lab experiment was used, in which the independent variable (type of model) was manipulated in three conditions:
- Aggressive model shown to 24 children.
- Non-aggressive model shown to 24 children.
- No model shown (control condition) - 24 children.

Details

24 children (12 boys and 12 girls) watched a male or female model behaving aggressively towards a toy called a 'Bobo doll'. The adults attacked the Bobo doll in a distinctive manner - they used a hammer in some cases, and in others threw the doll in the air and shouted "Pow, Boom".
Another 24 children (12 boys and 12 girls) were exposed to a non-aggressive model who played in a quiet and subdued manner for 10 minutes (playing with a tinker toy set and ignoring the bobo- doll).
The final 24 children (12 boys and 12 girls) were used as a control group and not exposed to any model at all.

Stage 2: Aggression Arousal

Each child was (separately) taken to a room with relatively attractive toys.
As soon as the child started to play with the toys, the experimenter told the child that these were the experimenter's very best toys and she had decided to reserve them for the other children.
This was done to build up frustration in the child. The experimenter said that the child could instead play with the toys in the experimental room (this included both aggressive and non-aggressive toys).

Stage 3: Test for Delayed Imitation

The next room contained some aggressive toys and some non-aggressive toys. The non-aggressive toys included a tea set, crayons, three bears, and plastic farm animals. The aggressive toys included a mallet and peg board, dart guns, and a 3-foot Bobo doll.
The child was in the room for 20 minutes, and their behavior was observed and rated through a one-way mirror. Observations were made at 5-second intervals.

Conclusion of Bobo Doll Experiment

Children who observed the aggressive model made far more imitative aggressive responses than those who were in the non-aggressive or control groups.
Boys were more likely to imitate same-sex models than girls.
Boys imitated more physically aggressive acts than girls.

Implications

The findings support Bandura's (1977) Social Learning Theory. That is, children learn social behavior such as aggression through the process of observation learning - through watching the behavior of another person.
This study has important implications for the effects of media violence on children.

Processes of Observational Learning

Attention: The extent to which we are exposed/notice the behavior. For a behavior to be imitated, it has to grab our attention.
Retention: How well the behavior is remembered.
Reproduction: This is the ability to perform the behavior that the model has just demonstrated.
Motivation: The will to perform the behavior.

Steps of Observational Learning:

Attention: Focuses on a model's behavior.
Retention: Retains this behavior in memory.
Production Processes: Ability to perform the behavior.
Motivation: A situation arises wherein the behavior is useful.

Vicarious Learning

Bandura believed that learning is not facilitated by reinforcement; behaviors enacted by others often reinforce or punish.
These outcomes of the modeled behavior are referred to as vicarious because they arouse emotional reactions in the observer.
For example, a teacher acknowledges a child who shares her crayons with others at a table, and a child who observed the situation experiences positive feelings.

Definitions

Vicarious reinforcement: It is our tendency to repeat or duplicate behaviors for which others are being rewarded.
Vicarious punishment occurs when the tendency to engage in a behavior is weakened after having observed the negative consequences for another engaging in that behavior. This is a form of observational learning as described by social learning theory.