Operant Conditioning

Operant conditioning is a type of learning in which a behavior becomes more likely to recur if followed by a reinforcer or less likely to recur if followed by a punisher.

This is based on Edward Thorndike’s law of effect - the principle that behaviors followed by favorable consequences become more likely, and behaviors followed by unfavorable consequences become less likely.

Burrhus Frederic Skinner, the “Father of Behaviorism” or “Father of Operant Conditioning”, elaborated on the law of effect, studying how pigeons and rats learn through reinforcement.  

He crafted an operant chamber, or Skinner box, which contained a bar or key that an animal can manipulate to obtain food or water.

In operant conditioning, reinforcement is any event that strengthens (or increases the likelihood) the behavior that follows.

  • positive reinforcement: increasing behaviors by presenting rewarding stimuli

    • Getting good grades encourages you to study, getting complimented on your looks encourages you to dress a certain way

  • negative reinforcement: increasing behaviors by stopping or reducing aversive stimuli

    • Putting on a coat to stop feeling cold, cleaning your room to get rid of the mess/smell

    • NOT THE SAME AS PUNISHMENT

  • Primary Reinforcers are innately rewarding by satisfying a biological need (food, water, shelter, etc.).

  • Conditioned (Secondary) Reinforcers are those that gain power through association with a primary reinforcer (money to buy food, water, shelter, etc.).

  • Reinforcement Schedules are patterns that define how often a desired response will be reinforced.

    • Continuous reinforcement - the desired behavior is reinforced every time

      • Used in the acquisition stage

      • Learning occurs faster, but doesn’t last as long

    • Partial or intermittent reinforcement - the desired behavior is reinforced only some of the time

      • Used once behavior is mastered

      • Learning occurs slowly, but lasts longer

  • Token economy: a system in which the learner earns tokens by engaging in a targeted behavior and those tokens can be exchanged for a reward.

Shaping is the process in which reinforcers guide behavior toward closer and closer (successive approximations) to the desired behavior (a.k.a. training).

Classical conditioning can explain the development of superstitious beliefs. Let’s say you are a keeper in soccer, and you left your normal gloves in the rain and have to wear your back-ups. You have the worst game of your life, letting the easiest saves go right past you. You associate the back-up gloves with the poor performance and refuse to ever wear them again.

  • US: poor performance

  • UR: shame/anger

  • CS: back-up gloves

  • CR: refusing to wear that pair of gloves ever again

Fixed means a set number, variable means a random or changing number. Interval requires time to pass, ratio requires actions to be taken.

  • Fixed ratio - a reinforcement schedule that reinforces a desired behavior only after a specific number of actions have been completed (ex. Getting a bonus for every three cars sold)

  • Fixed interval - a reinforcement schedule that reinforces a desired behavior only after a specific amount of time has passed (ex. Getting a paycheck every week)

  • Variable ratio - a reinforcement schedule that reinforces a desired behavior only after a specific number of actions have been completed (ex. Slot machines)

  • Variable interval - a reinforcement schedule that reinforces a desired behavior only after a unpredictable amount of time has passed (ex. Cooking times)

Reinforcement increases the likelihood that a response will happen, while punishment decreases it. Punishment is any event that tends to decrease the behavior that it follows.

  • Positive punishment - administration of an aversive stimulus

    • Traffic tickets, given extra chores

  • Negative punishment - removal of a pleasant/rewarding stimulus

    • Fines (losing money), losing car/phone privileges, getting grounded (losing freedom)


To signal a response will be reinforced, discriminative stimuli (a stimulus that elicits a response after association with reinforcement) are often used in the shaping process. In later versions of Skinner’s operant chambers, an electrified grid was added to elicit a light shock to the animals as an aversive stimulus. To warn the rats of the impending shock, a tone or light would come on. The shock would stop if a button was pressed. Soon the rats learned to press the button as soon as the light came on to avoid the shock. 

  • Escape learning: a type of negative reinforcement in which a behavior that removes an unpleasant stimulus is increased

    • Faking sick to leave a social gathering, sneaking out the back of the restaurant to get away from a bad date

  • Avoidance learning: a type of negative reinforcement in which a behavior that prevents removes an unpleasant stimulus is increased

    • Claiming your parents won’t allow you to attend the social gathering, ghosting people you are not interested in

Learned helplessness is the feeling of hopelessness and passive resignation an animal or person learns when they are unable to avoid repeated aversive events.

 

 

Classical Conditioning

Operant Conditioning

Basic idea

Learning associations between events we do not control.

Learning associations between our behavior and its consequences.

Response

Involuntary, automatic.

Voluntary, operates on environment.

Acquisition

Associating events; NS is paired with US and becomes CS.

Associating a response with a consequence (reinforcer or punisher).

Extinction

CR decreases when CS is repeatedly presented alone.

Responding decreases when reinforcement stops.

Spontaneous recovery

The reappearance, after a rest period, of an extinguished CR.

The reappearance, after a rest period, of an extinguished response.

Generalization

The tendency to respond to stimuli similar to the CS.

Responses learned in one situation occurring in other, similar situations.

Discrimination

Learning to distinguish between a CS and other stimuli that do not signal a US.

Learning that some responses, but not others, will be reinforced.