Chapter 6 - Instrumental/Operant Conditioning
Instrumental Conditioning Background
Classical Conditioning vs. Instrumental Conditioning
Here are the basic structures of classical conditioning (CC) and instrumental conditioning (IC):
CC: Stimulus + Stimulus = Conditioned Reflexive Response
An example is footsteps + food = salivation to footsteps
IC: Voluntary Response/Behaviour + Consequence = Change in Frequency of Voluntary Behaviour
An example is biting one’s nails + punishment = no more biting of nails
An important difference between CC and IC is that the resulting response in CC is reflexive, whereas the resulting response in IC is voluntary. For instance, Pavolv’s dog does not consciously choose to drool as a response to the bell; it is a reflexive behaviour. Conversely, if I bite my nails, someone makes a mean comment about them, then I might voluntarily choose not to bite my nails anymore — this would be an instance of instrumental conditioning.
The Basic Procedure of Instrumental Conditioning
In IC, voluntary responses are modified:
STEP I: The organism ‘reacts or behaves’
Example: A dog sits
Experimental Example: A starved rat gets a basketball in a mini basket
STEP II: A behaviour modification technique is applied
Example: A treat is given to the dog
Experimental Example: The rat is given food pellets
CONSEQUENCE: The reaction or behaviour either occurs more frequently or is reduced/stopped
Example: The dog sits to receive a treat
Experimental Example: The rat dunks the basketball to get food
Note that IC can also be used to produce complex behaviours.
Instrumental Conditioning Definitions
Basically, IC is a type of learning in which the consequences of behaviour tend to modify that behaviour in the future. Essentially, behaviour that is rewarded or reinforced tends to be repeated, whereas behaviour that is ignored or punished is less likely to be repeated.
Instrumental behaviour — Behaviour that occurs because it was previously needed for producing certain consequences.
Examples include…
Lever-pressing to receive a reward
Turning the key to start the car
Pulling the handle on a slot machine to win
Driving slowly as to not get a speeding ticket
Avoiding an electric fence to avoid getting shocked
Instrumental conditioning — Procedures developed to study instrumental behaviour through reinforcement and punishment.
Essentially, it’s looking at the whole operant conditioning process
Early Instrumental Conditioning Studies
Thorndike’s Early Studies
Edward L. Thorndike (1874-1949) was the first serious theoretical analyst of instrumental conditioning. His earlier experiments involved cat puzzle boxes, in which the cat had to learn to perform a particular behaviour to exit the puzzle box.

Thorndike’s puzzle box experiment:
The basic procedure of the experiment involved putting a hungry cat into a puzzle box — the cat had to pull a lever and a weighted string (in that order) to open the door of the puzzle box and get food (a positive reinforcer).
The observations were as followed: When initially placed in the box, Thorndike described the cats’ behaviour as being chaotic and erratic (e.g., random behaviours such as meowing or scratching various things). Eventually, the cat would incidentally pull the string and exit the puzzle box. However, after several trials, the cat would reduce the time it took to exit the puzzle box because it learned what behaviour it had to do (e.g., pull the string) in order to get out of the box and get to the food.
Main takeaway from the puzzle box experiment:
The cat tracked the outcome of its behaviour every trial, and eventually learnt that producing a particular behaviour in the box led to a specific outcome…
The structure of this is S → R → O
In context (S), response (R) produces outcome (O)
How this knowledge guides future behaviours:
Given S → R → O…
Behaviours with positive outcomes increase
Behaviours with negative outcomes decrease
Methodological issues with the puzzle box experiment:
How long do you wait until you say the cat didn’t learn (e.g., cutoff)?
You repeat the trials over and over again, resetting the animal and device
The results are hard to compare across animals
How do you generate a prediction from latencies?
Instrumental Conditioning Procedures
Discrete-Trial Procedures
Discrete-trial procedures include puzzle boxes and maze learning. The nature of discrete trial procedures is that a trial ends when the instrumental behaviour is displayed (e.g., when the cat in Thorndike’s puzzle box experiment, the trial ends).
Runway Maze (Straight-Alley Maze)

The idea of a runway maze (or straight-alley maze) is to put an organism in the start box (S), and measure its running speed latency to get to the goal box (G). For instance, I might put a rat in the start box, put fruit loops at the goal box, and measure the rat’s running speed latency over several trials.
T-Maze

The T-maze is typically used for memory studies and other aspects of behaviour. The main idea behind this type of maze is that the organism is place at the start box (S), and usually at one of the two goal boxes (G) has a reward. After the organism finds the reward on the first trial, you measure the running speed latency, but the main way they demonstrate learning is by making the correct turn towards the reward. For example, I might put a rat in a T-maze, put fruit loops in the left goal box, and measure the rat’s running speed latency in addition to observing whether it makes the correct turn over several trials.
8-Arm Radial Maze

The eight-arm radial maze is also used for memory studies and other aspects of behaviour. The main idea for this type of maze is to place rewards on different arms at different times — this makes the organism learn the apparatus, and learn when and where to go for the reward. Typically, the arms are about three to four feet off the ground; the reason for this is that organisms like rats don’t like heights, so they aren’t as likely to go out on the arms unless there is a good reason to do so (e.g., a reward). Additionally, there can be discriminative stimuli for each arm (such as different flooring, lighting, or odours) for the organism to more easily distinguish each arm or to impact the likelihood for it to go on particular arms.
Free Operant Procedures
Free operant procedures differ compared to discrete trial procedures in one main way:
In discrete trial procedures, only one instrumental behaviour can be displayed per trial (e.g., when the organism display the target instrumental behaviour, the trial ends)
In free operant procedures, more than one instrumental behaviour can be displayed per trial (e.g., the organism can display any number of instrumental behaviours over a the duration of a trial)
An operant response is defined in terms of its effect on the environment. In other words, operant responses are so because they impact the environment in some way. For example, a rat that presses the lever in a Skinner Box is producing food because of its lever-pressing.
Different types of operant responses:
Lever-pressing
Rats learning lever-pressing
Chain-pulling
Rats or birds learning chain-pulling
Nose-poking
Rats learning to poke something with their nose
Pecking
Birds learning to peck something
What is the dependent variable?
Response-rate
Total number of responses
Latency to respond
B.F. Skinner and the Skinner Box

Skinner was considered the leading authority of IC, and was influenced by Thorndike. Skinner invented the Skinner Box to test IC through shaping. One advantage of the Skinner Box is that you can allow for more than one instrumental behaviour per trial compared to discrete trial procedures (many levers, many chains, etc.).
An example of a Skinner Box is the chamber that trains rats to bar-press for rewards
The Initial Learning Procedure in a Skinner Box
IC involves learning familiar responses in new situations or in new ways. In other words, it involves taking what is already known by organism and modifying it in different ways — it’s not about teaching brand new behaviours, but modifying existing behaviours.
For instance, rats in a maze study may need to learn where and what to run for — rats don’t need to learn how to run, but rather where to run, where to turn, and what they will find at the goal box.
Basically, the organism is constructing new responses from familiar components.
For example, to press a lever, rats have to combine various familiar behaviours (e.g., raising their paws, standing on their hind legs, and so on)
Shaping and Chaining
Shaping reinforces any movement in the direction of the desired response.
In other words, shaping is done by rewarding successive approximations, which is quicker than waiting for the response to occur and then reinforcing it.
It is used effectively to condition humans and many types of animals, such as parents and their children, teachers and their students, or coaches and their athletes.
It incrementally builds a complex response through successive approximations
For example: In a lever-pressing rat experiment, you could wait for dumb luck for the rat to press the lever for the first time. However, rewarding successive approximations will speed up the learning process. For example, shaping could take on the form of rewarding the rat for looking in the direction of the lever — which increases the probability of facing the lever. After doing this, you could withhold the reward until the rat approaches the lever — which increases the probability of the rat getting closer to the lever. The next step might be withholding the reward until the rat touches the lever; you would repeat this step-by-step process until you eventually get to the desired operant response — pressing the lever.
Another example is getting a young child to say a word — you reward the child for saying a letter of the word, which increases the likelihood of getting the child to actually say the word.
Chaining is the process of building complex operant response sequences by linking together S → R → O conditions.
An example is initially training an animal to pick up an object. Then, rewarding the animal for both picking up the reward and throwing it. It allows for a series of behaviours (as opposed to shaping, which simply elaborates on a single response).
Another example is the rat basketball experiment seen in class.
Shaping and Chaining Combined
Here’s the summary of what shaping and chaining are:
Shaping:
Shaping through successive approximations builds a complex response incrementally
Initially, the contingency (e.g., reward given if behaviour is produced) is introduced for a simple behaviour (rudimentary version of R, which is the desired operant response); as the rate of the behaviour increases, the contingency is provided for increasingly complex forms of the behaviour (e.g., from touching to pressing a lever)
Gradually, it builds a complex R that an animals would never spontaneously produce
Chaining:
Chaining builds complex R sequences by linking together S→R→O (if S, then R, leads to O) conditions
An example is training an animal to pick up an object and then rewarding it for both picking it up and then throwing it (e.g., chaining behaviours together)
It allows for a series of behaviours (as opposed to shaping, which simply elaborates on a simple response)
Shaping and chaining can be used together to train animals to complete incredibly complex behaviours. Both techniques require skill and patience from the trainer.
Can keep an animal motivated and interested
Must select proper training sequence
Cannot move too fast
How to Get a Rat to Lever Press
25:28 of the audio lecture
IC in the Skinner Box
Outcomes (O):
± food delivery
± shock through wires in the floor (punishment)
Behaviour (R): rate of lever pressing
Context (S): light that signals box is “on”
Note than animal is “free” in the chamber, no experimenter intervention
Free-operant learning
Also, many possible contingencies can be introduced
Positive Reinforcement: Press lever (R) → GET FOOD | Positive Punishment Press lever (R) → GET SHOCK |
Negative Reinforcement Press lever (R) → STOP SHOCK | Negative Punishment Press lever (R) → STOPS FOOD |
Structure of the IC Skinner Box Experiment
Initially, tries many things; eventually, accidentally presses the lever, produces a positive effect
Now starts hanging around the lever, accidentally presses it again
Rat has learned a contingency: if light on (S), pressing lever (R) → food (O); spends much of tis day pressing and eating
Basic Pattern of IC
Generalizing & Discrimination
Influencing Factors
A summary of response-outcome procedures with consequences:
Procedure
Type
Outcome
Result
Positive Reinforcement
Positive
Response produces appetitive stimulus
Increase in response rate
Positive Punishment
Positive
Response leads to aversive stimulus
Decreased response rate
Negative Reinforcement
Negative
Response removes/avoids aversive stimulus
Increase in response rate
Omission Training
Negative
Response removes/avoids appetitive stimulus
Decrease in response rate
Distinguishing Between Reinforcement and Punishment
Positive Reinforcement:
Add something to increase behavior.
Negative Reinforcement:
Remove something to increase behavior.
Positive Punishment:
Add something to decrease behavior.
Negative Punishment:
Remove something to decrease behavior.
Instrumental vs. Classical Conditioning
IC: The animal operates on the environment.
CC: The environment operates on the animal (learning involves predictive CS-US relationship).
Characteristics of IC vs. CC
Characteristics
Classical Conditioning
Instrumental Conditioning
Type of association
Between two stimuli
Between a response and its consequence
State of subject
Passive
Active
Focus of attention
On what precedes response
On what follows response
Type of response
Involuntary
Voluntary
Typical bodily response involved
Internal (emotional)
External (physical movement)
Complexity
Relatively simple
Simple to highly complex