SR

Week 5: Operant Conditioning Notes

Operant Conditioning I

Part 1: Operant Conditioning (aka Instrumental Conditioning)

  • Instrumental Conditioning: Learning about outcomes contingent upon the organism's behavior.

    • A response is elicited to produce or avoid an outcome.

  • Example with Pigeons:

    • Initially, keylight illumination produces no response.

    • Eventually, spontaneous behavior leads the pigeon to peck at the illuminated keylight, resulting in food delivery.

    • The food increases the pigeon's pecking response (CR).

    • Keylight illumination alone does not result in food, contrasting with classical conditioning.

    • Pecking without the CS (illuminated keylight) is not rewarded.

  • Instrumental conditioning rewards behaviors in the presence of a CS, leading to repetition.

    • These become conditioned instrumental responses.

    • Behaviors resulting in punishment occur less frequently.

    • A CS must be present to elicit the instrumental response, showing evidence of learning.

  • Instrumental vs. Operant Conditioning:

    • Instrumental Conditioning: The organism's behavior is instrumental in obtaining a rewarding outcome.

    • Operant Conditioning: The organism must operate on its environment to obtain a rewarding outcome.

    • Instrumental conditioning = operant conditioning

Part 2: Thorndike's Law of Effect

  • Thorndike's Law of Effect:

    • First systematic investigations into operant learning.

    • Studied animal intelligence using cats, measuring their ability to escape from a puzzle box.

  • Learning occurs by trial and error.

    • Accepted as evidence of S-R learning between the puzzle box and the escape response.

  • S-R Association:

    • Stimulus-Response associations.

  • Responses followed by a pleasant reward strengthen the S-R association, leading to more responding.

  • Responses followed by a noxious outcome (punishment) weaken the S-R association, leading to less responding.

  • S-R associations can account for many ongoing habits, even despite a desire to quit.

  • Discriminative Stimulus (S+) \rightarrow Instrumental Response (IR) \rightarrow Outcome (O)

Part 3: Reinforcement and Punishment

  • Rewarding outcomes strengthen the S-R association, while punishing outcomes weaken it.

  • Reinforcement is rewarding and increases operant responding (e.g., food).

  • Punishment is aversive and decreases operant responding (e.g., shock).

  • Both reinforcement and punishment can involve giving or removing an outcome.

  • Outcomes:

    • Positive Reinforcement: Producing the operant response in the presence of the S+ results in the presentation of an appetitive outcome.

    • Negative Reinforcement: Producing the operant response in the presence of the S+ results in the removal of an aversive (punishing) outcome.

    • Positive Punishment: Producing the operant response in the presence of the S+ results in the presentation of the aversive outcome.

    • Negative Punishment: Producing the operant response in the presence of the S+ results in the removal of an appetitive outcome.

  • Instrumental Conditioning Procedures:

    • Positive: Instrumental response produces outcome.

    • Negative: Instrumental response removes outcome.

    • Reinforcement: Rewarding outcome.

    • Punishment: Aversive outcome.

Positive

Negative

Reinforcement

Increases IR (Positive Reinforcement)

Escape or Avoidance (Negative reinforcement)

Punishment

Decreases IR (Positive Punishment)

Omission training (DRO) (Negative punishment)

  • Reinforcement increases instrumental responding. Example: Giving a child a lollipop for behaving well at the doctor's office or using an umbrella to shield from the rain.

  • Punishment decreases instrumental responding. Example: Receiving a failing grade for not turning in an assignment or sending a child to time-out for misbehaving.

  • Attention to unwanted behaviors can inadvertently reinforce them (e.g., attending to a crying child).

Part 4: Procedures to Study Operant Conditioning

  • Discrete-Trial Procedures:

    • Clear beginning and end of trial.

    • Instrumental response performed once per trial.

    • Commonly used with rats in mazes.

    • Measures running speed and response latency.

  • Free-Operant Procedures:

    • Invented by B. F. Skinner.

    • Organism makes operant responses repeatedly without constraint before removal from apparatus.

    • Allows study of behavior in a continuous manner, observing chains of behaviors.

  • Operant Conditioning Elements: Organisms learn to emit a specific behavior in the presence of a specific stimulus to receive a reward.

    • Discriminative Stimulus: Signals the contingency between a response and the outcome.

      • S+: Signals a positive contingency between the response and the outcome.

      • S-: Indicates a negative contingency between the response and the outcome.

  • Behavior is elicited if the organism is motivated to receive the outcome. Motivation will be covered next week.

    • Learning is measured by changes in the strength or frequency of the conditioned instrumental response when the discriminative stimulus is presented.

  • Operant Conditioning Experiments Often Include a Two-Step Pre-Training Procedure:

    • Classical conditioning to associate the magazine (food delivery device) with food reinforcement.

    • Shaping to train the animal to perform the targeted response (lever press, chain pull, etc.) for food.

  • Shaping Behaviours:

    • Response shaping involves establishing behavior through reinforced approximations and nonreinforcement of earlier forms.

    • Requires a careful balance of reinforcement and withholding.

  • Creation of novel responses takes advantage of inherent variability of behavior.

    • Used extensively in animal training.

  • Shaping is used to generate "new" behavior.

    • Most responses are preexisting components performed in response to new situations/goals.

    • The organism must have the physical ability to perform the desired behavior and perceive the discriminative stimulus.

  • After learning and reliable performance of the targeted response, motivation is introduced.

    • Deprive the organism of food or water to motivate the response for reinforcement.

    • The outcome must be reinforcing, not punishing, to motivate responding.

  • Operant Conditioning Experiments Involve:

    • Classically conditioning a discriminative stimulus to an outcome (S-S or S-O association).

    • Instrumentally conditioning a stimulus to a response (S-R association).

    • Learning is demonstrated by changes in instrumental responding in the presence of a discriminative stimulus.

Part 5: Factors That Influence Operant Conditioning

  • Fundamentals of Instrumental Conditioning:

    • Instrumental response.

    • Instrumental reinforcer.

    • Response-reinforcer relation.

  • Instrumental Response:

    • Response production: Operant response is the behavior elicited to have an effect on the environment.

    • All individual acts achieving the same environmental change goal are instances of the same operant response (e.g., using either hand to insert a coin).

    • Operant response is judged by outcome, not specific production.

  • Response Stereotypy:

    • Without explicit reinforcement of variability, responding becomes more stereotyped with continued instrumental conditioning.

  • Relevance or Belongingness of Instrumental Response:

    • Relates to Garcia & Koeller's bright and noisy water experiment.

    • Certain responses naturally belong with reinforcers due to evolutionary history (e.g., teaching cats to yawn to escape a puzzle box).

    • Male stickleback fish biting response for males vs. female reinforcers.

  • Instinctive Drift:

    • Breland & Breland (1961) experienced difficulty training a raccoon and pig to drop coins into a coin bank.

    • Relates to behavior systems theory and natural foraging responses overriding trained responses.

    • The effectiveness of a procedure in increasing an instrumental response depends on the compatibility of that response with the preexisting organization of the behavior system.

  • Instrumental Reinforcer:

    • Quality and quantity: Better rewards result in better performance, but are also influenced by the response requirement.

    • Perceived quality and quantity are influenced by the organism's previous experience with that reinforcer.

    • Similar to the idea of expectation driving excitation and inhibition.

  • Saccharin Contrast Effect Example: Initial exposure to 32% sucrose \rightarrow experience 4% sucrose \rightarrow go back to 32% sucrose again \rightarrow +ve contrast.

  • Response-Reinforcer Relation:

    • Temporal Relation: Time between response and reinforcer.

      • Immediate reinforcement > delayed reinforcement, partly due to ambiguity as to cause of outcome.

      • Techniques to reduce ambiguity or misattribution of cause:

        • Provide secondary, or conditioned, reinforcer (e.g., clicker training, verbal reinforcement).

        • Marking procedure.

  • Response-Reinforcer Contingency: Extent to which the instrumental response is necessary and sufficient to produce the reinforcer.

    • Perfect contingency not necessary.

    • Perceived contingency is important, not actual contingency.

  • Superstition:

    • Skinner's Superstition Experiment: Skinner's explanation rests on accidental, or adventitious, reinforcement.

    • Easy to form associations between occasionally paired events.

    • Argued that temporal contiguity is primarily responsible for learning, and a positive responses-reinforcer contingency is not necessary for instrumental conditioning.

    • Staddon’s Reinterpretation: Replicated Skinner’s study with more extensive/systematic observations.

      • Terminal responses made toward end of interval when probability of the outcome was high (e.g., orienting toward the food magazine and pecking).

      • Interim responses were all other behaviours (e.g., tracking the wall or turning) that increased after presentation of the food and then decreased during the ISI.

      • Fits in with foraging behavior system. No evidence for accidental reinforcement effects.

  • Controllability of Reinforcers:

    • The ability to control the occurrence of an outcome allows prediction.

    • A zero contingency relationship results in high levels of stress and anxiety that can persist in the long term.

  • Learned Helplessness Effect: Seligman, Overmier, & Maier. Application to clinical depression.

    • Triadic design using shuttle apparatus:

Group

Exposure

Conditioning

Result

E

Escapable shock

Escape-avoidance training

Rapid avoidance learning

Y

Yoked inescapable shock

Escape-avoidance training

Slow avoidance learning

R

Restricted to apparatus

Escape-avoidance training

Rapid avoidance learning

  • Learned Helplessness Hypothesis: Animal perceives 0 contingency relationship and assumes that future reinforcers will also be independent of their behavior.

    • This undermines the ability to learn new instrumental responses.

    • Learning deficit due to reduced motivation to perform an instrumental response and deficient ability to learn that behavior is now effective.

  • Alternative Hypotheses:

    • Activity Deficit Hypothesis: Specific to producing movement. Uncontrollable shocks disrupt escape learning in shuttle box due to freezing response but facilitate eyeblink conditioning.

    • Attention Deficit Hypothesis: Exposure to inescapable shock reduces the extent to which animals pay attention to their own behaviour, causing a learning deficit.

    • Stimulus Relations in Escape Conditioning: Asks why escapable shock is not so bad. Escape behavior results in termination of an aversive stimulus.

      • Making an escape response results in internal feedback cues.

      • Shock-cessation feedback cues: response-produced stimuli experienced at the start of escape response when shock is turned on.

      • Safety-signal feedback cues: response-produced stimuli experienced at the end of escape response when shock is turned off. Become conditioned inhibitors. Signaling end of inescapable shock eliminates learning deficit.

Summary

  • Operant conditioning requires learning to make an operant response in the presence of a discriminative stimulus to be reinforced.

  • Studied in discrete-trial or free-operant procedures.

  • Typically requires shaping the targeted operant behavior beforehand.

  • Shaping produces new behaviors through systematic reinforcement and nonreinforcement.

  • Responding is influenced by the outcome:

    • Reinforcement increases operant responding.

    • Punishment decreases operant responding.

  • Conditioning effectiveness depends on factors like contiguity, belongingness, salience, and contingency.

  • The relative or perceived value of reinforcement influences behavior.

  • Perceived effectiveness of the operant response is also important.

Instrumental vs. Classical Conditioning

Classical Conditioning

Instrumental Conditioning

Conditioned Stimulus

Repeated pairings with a US elicits a CR, signaling an outcome regardless of behavior

Discriminative Stimulus: indicates the availability of reinforcement contingent upon the organism making a conditioned operant response

Conditioned Response

Response elicited by the CS through repeated pairing with the US

Operant Behaviour: behaviour which increases the probability that a reinforcer will be obtained

Response function

Preparatory behaviour

Response function: Goal directed

Unconditioned Stimulus

Reliably elicits a response prior to the conditioning trials

Reinforcer: a contingent event which increases the probability that a behaviour will be performed again

Unconditioned Response

Response to a particular stimulus prior to the conditioning trails

Natural Variation in Behaviour: Frequency of behaviour influenced by shaping and chaining of behaviours