Instrumental Conditioning: Learning about outcomes contingent upon the organism's behavior.
A response is elicited to produce or avoid an outcome.
Example with Pigeons:
Initially, keylight illumination produces no response.
Eventually, spontaneous behavior leads the pigeon to peck at the illuminated keylight, resulting in food delivery.
The food increases the pigeon's pecking response (CR).
Keylight illumination alone does not result in food, contrasting with classical conditioning.
Pecking without the CS (illuminated keylight) is not rewarded.
Instrumental conditioning rewards behaviors in the presence of a CS, leading to repetition.
These become conditioned instrumental responses.
Behaviors resulting in punishment occur less frequently.
A CS must be present to elicit the instrumental response, showing evidence of learning.
Instrumental vs. Operant Conditioning:
Instrumental Conditioning: The organism's behavior is instrumental in obtaining a rewarding outcome.
Operant Conditioning: The organism must operate on its environment to obtain a rewarding outcome.
Instrumental conditioning = operant conditioning
Thorndike's Law of Effect:
First systematic investigations into operant learning.
Studied animal intelligence using cats, measuring their ability to escape from a puzzle box.
Learning occurs by trial and error.
Accepted as evidence of S-R learning between the puzzle box and the escape response.
S-R Association:
Stimulus-Response associations.
Responses followed by a pleasant reward strengthen the S-R association, leading to more responding.
Responses followed by a noxious outcome (punishment) weaken the S-R association, leading to less responding.
S-R associations can account for many ongoing habits, even despite a desire to quit.
Discriminative Stimulus (S+) \rightarrow Instrumental Response (IR) \rightarrow Outcome (O)
Rewarding outcomes strengthen the S-R association, while punishing outcomes weaken it.
Reinforcement is rewarding and increases operant responding (e.g., food).
Punishment is aversive and decreases operant responding (e.g., shock).
Both reinforcement and punishment can involve giving or removing an outcome.
Outcomes:
Positive Reinforcement: Producing the operant response in the presence of the S+ results in the presentation of an appetitive outcome.
Negative Reinforcement: Producing the operant response in the presence of the S+ results in the removal of an aversive (punishing) outcome.
Positive Punishment: Producing the operant response in the presence of the S+ results in the presentation of the aversive outcome.
Negative Punishment: Producing the operant response in the presence of the S+ results in the removal of an appetitive outcome.
Instrumental Conditioning Procedures:
Positive: Instrumental response produces outcome.
Negative: Instrumental response removes outcome.
Reinforcement: Rewarding outcome.
Punishment: Aversive outcome.
Positive | Negative | |
---|---|---|
Reinforcement | Increases IR (Positive Reinforcement) | Escape or Avoidance (Negative reinforcement) |
Punishment | Decreases IR (Positive Punishment) | Omission training (DRO) (Negative punishment) |
Reinforcement increases instrumental responding. Example: Giving a child a lollipop for behaving well at the doctor's office or using an umbrella to shield from the rain.
Punishment decreases instrumental responding. Example: Receiving a failing grade for not turning in an assignment or sending a child to time-out for misbehaving.
Attention to unwanted behaviors can inadvertently reinforce them (e.g., attending to a crying child).
Discrete-Trial Procedures:
Clear beginning and end of trial.
Instrumental response performed once per trial.
Commonly used with rats in mazes.
Measures running speed and response latency.
Free-Operant Procedures:
Invented by B. F. Skinner.
Organism makes operant responses repeatedly without constraint before removal from apparatus.
Allows study of behavior in a continuous manner, observing chains of behaviors.
Operant Conditioning Elements: Organisms learn to emit a specific behavior in the presence of a specific stimulus to receive a reward.
Discriminative Stimulus: Signals the contingency between a response and the outcome.
S+: Signals a positive contingency between the response and the outcome.
S-: Indicates a negative contingency between the response and the outcome.
Behavior is elicited if the organism is motivated to receive the outcome. Motivation will be covered next week.
Learning is measured by changes in the strength or frequency of the conditioned instrumental response when the discriminative stimulus is presented.
Operant Conditioning Experiments Often Include a Two-Step Pre-Training Procedure:
Classical conditioning to associate the magazine (food delivery device) with food reinforcement.
Shaping to train the animal to perform the targeted response (lever press, chain pull, etc.) for food.
Shaping Behaviours:
Response shaping involves establishing behavior through reinforced approximations and nonreinforcement of earlier forms.
Requires a careful balance of reinforcement and withholding.
Creation of novel responses takes advantage of inherent variability of behavior.
Used extensively in animal training.
Shaping is used to generate "new" behavior.
Most responses are preexisting components performed in response to new situations/goals.
The organism must have the physical ability to perform the desired behavior and perceive the discriminative stimulus.
After learning and reliable performance of the targeted response, motivation is introduced.
Deprive the organism of food or water to motivate the response for reinforcement.
The outcome must be reinforcing, not punishing, to motivate responding.
Operant Conditioning Experiments Involve:
Classically conditioning a discriminative stimulus to an outcome (S-S or S-O association).
Instrumentally conditioning a stimulus to a response (S-R association).
Learning is demonstrated by changes in instrumental responding in the presence of a discriminative stimulus.
Fundamentals of Instrumental Conditioning:
Instrumental response.
Instrumental reinforcer.
Response-reinforcer relation.
Instrumental Response:
Response production: Operant response is the behavior elicited to have an effect on the environment.
All individual acts achieving the same environmental change goal are instances of the same operant response (e.g., using either hand to insert a coin).
Operant response is judged by outcome, not specific production.
Response Stereotypy:
Without explicit reinforcement of variability, responding becomes more stereotyped with continued instrumental conditioning.
Relevance or Belongingness of Instrumental Response:
Relates to Garcia & Koeller's bright and noisy water experiment.
Certain responses naturally belong with reinforcers due to evolutionary history (e.g., teaching cats to yawn to escape a puzzle box).
Male stickleback fish biting response for males vs. female reinforcers.
Instinctive Drift:
Breland & Breland (1961) experienced difficulty training a raccoon and pig to drop coins into a coin bank.
Relates to behavior systems theory and natural foraging responses overriding trained responses.
The effectiveness of a procedure in increasing an instrumental response depends on the compatibility of that response with the preexisting organization of the behavior system.
Instrumental Reinforcer:
Quality and quantity: Better rewards result in better performance, but are also influenced by the response requirement.
Perceived quality and quantity are influenced by the organism's previous experience with that reinforcer.
Similar to the idea of expectation driving excitation and inhibition.
Saccharin Contrast Effect Example: Initial exposure to 32% sucrose \rightarrow experience 4% sucrose \rightarrow go back to 32% sucrose again \rightarrow +ve contrast.
Response-Reinforcer Relation:
Temporal Relation: Time between response and reinforcer.
Immediate reinforcement > delayed reinforcement, partly due to ambiguity as to cause of outcome.
Techniques to reduce ambiguity or misattribution of cause:
Provide secondary, or conditioned, reinforcer (e.g., clicker training, verbal reinforcement).
Marking procedure.
Response-Reinforcer Contingency: Extent to which the instrumental response is necessary and sufficient to produce the reinforcer.
Perfect contingency not necessary.
Perceived contingency is important, not actual contingency.
Superstition:
Skinner's Superstition Experiment: Skinner's explanation rests on accidental, or adventitious, reinforcement.
Easy to form associations between occasionally paired events.
Argued that temporal contiguity is primarily responsible for learning, and a positive responses-reinforcer contingency is not necessary for instrumental conditioning.
Staddon’s Reinterpretation: Replicated Skinner’s study with more extensive/systematic observations.
Terminal responses made toward end of interval when probability of the outcome was high (e.g., orienting toward the food magazine and pecking).
Interim responses were all other behaviours (e.g., tracking the wall or turning) that increased after presentation of the food and then decreased during the ISI.
Fits in with foraging behavior system. No evidence for accidental reinforcement effects.
Controllability of Reinforcers:
The ability to control the occurrence of an outcome allows prediction.
A zero contingency relationship results in high levels of stress and anxiety that can persist in the long term.
Learned Helplessness Effect: Seligman, Overmier, & Maier. Application to clinical depression.
Triadic design using shuttle apparatus:
Group | Exposure | Conditioning | Result |
---|---|---|---|
E | Escapable shock | Escape-avoidance training | Rapid avoidance learning |
Y | Yoked inescapable shock | Escape-avoidance training | Slow avoidance learning |
R | Restricted to apparatus | Escape-avoidance training | Rapid avoidance learning |
Learned Helplessness Hypothesis: Animal perceives 0 contingency relationship and assumes that future reinforcers will also be independent of their behavior.
This undermines the ability to learn new instrumental responses.
Learning deficit due to reduced motivation to perform an instrumental response and deficient ability to learn that behavior is now effective.
Alternative Hypotheses:
Activity Deficit Hypothesis: Specific to producing movement. Uncontrollable shocks disrupt escape learning in shuttle box due to freezing response but facilitate eyeblink conditioning.
Attention Deficit Hypothesis: Exposure to inescapable shock reduces the extent to which animals pay attention to their own behaviour, causing a learning deficit.
Stimulus Relations in Escape Conditioning: Asks why escapable shock is not so bad. Escape behavior results in termination of an aversive stimulus.
Making an escape response results in internal feedback cues.
Shock-cessation feedback cues: response-produced stimuli experienced at the start of escape response when shock is turned on.
Safety-signal feedback cues: response-produced stimuli experienced at the end of escape response when shock is turned off. Become conditioned inhibitors. Signaling end of inescapable shock eliminates learning deficit.
Operant conditioning requires learning to make an operant response in the presence of a discriminative stimulus to be reinforced.
Studied in discrete-trial or free-operant procedures.
Typically requires shaping the targeted operant behavior beforehand.
Shaping produces new behaviors through systematic reinforcement and nonreinforcement.
Responding is influenced by the outcome:
Reinforcement increases operant responding.
Punishment decreases operant responding.
Conditioning effectiveness depends on factors like contiguity, belongingness, salience, and contingency.
The relative or perceived value of reinforcement influences behavior.
Perceived effectiveness of the operant response is also important.
Classical Conditioning | Instrumental Conditioning | |
---|---|---|
Conditioned Stimulus | Repeated pairings with a US elicits a CR, signaling an outcome regardless of behavior | Discriminative Stimulus: indicates the availability of reinforcement contingent upon the organism making a conditioned operant response |
Conditioned Response | Response elicited by the CS through repeated pairing with the US | Operant Behaviour: behaviour which increases the probability that a reinforcer will be obtained |
Response function | Preparatory behaviour | Response function: Goal directed |
Unconditioned Stimulus | Reliably elicits a response prior to the conditioning trials | Reinforcer: a contingent event which increases the probability that a behaviour will be performed again |
Unconditioned Response | Response to a particular stimulus prior to the conditioning trails | Natural Variation in Behaviour: Frequency of behaviour influenced by shaping and chaining of behaviours |