PY2111 CH5 Instrumental Conditioning: Foundations

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/62

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

63 Terms

New cards

Instrumental behaviour

Behaviour that occurs because it was previously effective in producing certain consequences

New cards

Thorndike’s law of effect

A mechanism of instrumental behaviour, proposed by Thorndike, which states that if a response R is followed by a satisfying event in the presence of a stimulus S, the association between the stimulus and response (S-R) will be strengthened

If the response is followed by an annoying event, the S-R association will be weakened

Impt: Involves S-R learning — what is learned is an association between the response and the stimuli present at the time of the response.

The consequence of the response is not one of the elements in the association
The satisfying/annoying consequence simply serves to strengthen or weaken the association between the preceding stimulus and response

Attractive mechanism to explain compulsive habits that are hard to break, e.g.

S: Sight and smell of popcorn
R: Compels you to grab more popcorn and eat it
A habitual smoker who knows that smoking is harmful will continue to smoke because S-R mechanisms compel lighting a cigarette independent of the consequences of the response

New cards

Approaches to the study of instrumental conditioning

Discrete-trial procedures

Free-operant procedures

New cards

Discrete-trial procedures

W.S. Small

A method of instrumental conditioning in which the pp can perform the instrumental response only during specified periods, usually determined either by placement of the pp in an experimental chamber or by the presentation of a stimulus

Usually involve the use of some type of maze

Runway/straight-alley maze
T maze

Animal has limited opportunities to respond, and these opportunities are scheduled by the experimenter

New cards

Runway/straight-alley maze

Contains a start box at one end and a goal box at the other

Rat is placed in the start box
Barrier separating the start box from the main section of the runway is raised
Rat is allowed to make its way down the runway until it reaches the goal box, which usually contains a reinforcer

Behaviour quantified by:

Running speed
Latency

New cards

Running speed

Used to quantify behaviour of rats in a runway maze

How fast the animal gets from the start box to the goal box

Typically increases with repeated trials

New cards

Latency

The time between the start of a trial/stimulus and the instrumental response

Typically decreases with repeated trials

New cards

T maze

Consists of a start box and alleys arranged in the shape of a T

A goal box is located at the end of each arm of the T

Because it has two choice arms, it can be used to study more complex questions

New cards

Free-operant procedures

B. F. Skinner (1938): to study behaviour in a more continuous manner

A method of instrumental conditioning that permits repeated performance of the instrumental response without intervention by the experimenter

Concept of the operant as a way of dividing behaviour into meaningful measurable units

Skinner box
Magazine training & shaping

New cards

Skinner box

A small chamber that contains a lever that the rat can push down repeatedly

Has a mechanism that can deliver a reinforcer (e.g. food/water) into a cup

The lever is electronically connected to the food-delivery system so that when the rat presses the lever, a pellet of food automatically falls into the food cup

New cards

Operant response

A response that is defined by the effect it produces in the environment

E.g. Any sequence of movements that depress the lever/opens the door constitutes an instance of that particular operant

Operational outcome is the critical measure of success

New cards

Instrumental response

Any response that is required to produce a desired consequence

New cards

Magazine training and shaping

The shaping of a new operant response requires approximations to the final behaviour

Successful shaping involves three components:

Clearly define the final response you want the trainee to perform
Clearly assess the starting level of performance, no matter how far it is from the final response you are interested in
Divide the progression from the starting point to the final target behaviour into appropriate training steps

The successive approximations make up your training plan

Execution of training plan involves two complementary tactics:

Reinforcement of successive approximations to the final behaviour
Withholding reinforcement for earlier response forms

New cards

Magazine training

A preliminary stage of instrumental conditioning in which a stimulus is repeatedly paired with the reinforcer to enable the pp to learn to go and get the reinforcer when it is presented

The sound of the food-delivery device, for example, may be repeatedly paired with food so that the animal will learn to go to the food cup when food is delivered

Involves classical conditioning

Sound of food-delivery device (food magazine) repeatedly paired with release of food pellet into cup
Sound elicits a classically conditioned approach response: animal goes to food cup and picks up pellet

After this phase, the rat is ready to learn the required operant response

New cards

Response shaping

Reinforcement of successive approximations to a desired instrumental response, e.g.

Food is given if the rat does anything remotely related to pressing the lever
1. E.g. Rearing response: gets up on its hind legs anywhere in the experimental chamber
Food pellet may be given only if the rat makes the rearing response over the response lever (rearing in other parts no longer reinforced)
Food pellet may be given only if the rat touches and depresses the lever

New cards

Shaping new behaviour

Not teaching new response components but how to combine familiar responses into a new activity

Construction/synthesis of a new behavioural unit from preexisting response components that already occur in the organism’s repertoire

Possible to produce responses unlike anything the trainee ever did before

E.g. Expert performances involve novel response forms
Shaping process takes advantage of the variability of behaviour to gradually move the distribution of responses away from the trainee’s starting point and toward responses that are entirely new in the trainee’s repertoire

New cards

Rate of occurrence

Skinner proposed to be used as a measure of response probability: now the primary measure in free-operant studies

With continuous opportunity to respond, the organism determines the frequency of its instrumental response → opportunity to observe changes in the likelihood of behaviour over time

Highly likely responses occur often and have a high rate

Unlikely responses occur seldom and have a low rate

New cards

Instrumental conditioning procedures

Positive reinforcement

Punishment

Negative reinforcement

Omission Training/Negative Punishment

New cards

Appetitive stimulus

A pleasant or satisfying stimulus that can be used to positively reinforce an instrumental response

New cards

Aversive stimulus

An unpleasant or annoying stimulus that can be used to punish an instrumental response

New cards

Positive reinforcement

Instrumental response produces an appetitive stimulus

Positive contingency between the instrumental response and appetitive stimulus

Produces an increase in the rate of responding

New cards

Punishment/Positive punishment

Instrumental response produces an aversive stimulus

Positive contingency between the instrumental response and aversive stimulus

Produces a decrease in the rate of responding

New cards

Negative reinforcement

Instrumental response turns off an aversive stimulus

Negative contingency between the instrumental response and aversive stimulus

Produces an increase in the rate of responding

New cards

Omission training/negative punishment

Instrumental response results in the removal of an appetitive stimulus

Negative contingency between the response and an environmental event

Produces a decrease in the rate of responding

Often preferred as a method of discouraging human behaviour because it does not involve delivering an aversive stimulus

New cards

Differential reinforcement of other behaviour (DRO)

An instrumental conditioning procedure in which a positive reinforcer is periodically delivered only if the pp does something other than the target response

Another word for omission-training procedures

Involves the reinforcement of other behaviour

New cards

Fundamental elements of instrumental conditioning

Instrumental response

Behavioural variability vs stereotypy
Relevance/belongingness
Behaviour systems and constraints

Instrumental reinforcer

Response-reinforcer relation (contingency)

New cards

Avoidance

An instrumental conditioning procedure in which the instrumental response prevents the delivery of an aversive stimulus

New cards

Behavioural variability/stereotypy

Novel response forms can be readily produced by instrumental conditioning if response variation is a requirement for reinforcement

In the absence of explicit reinforcement of variability, responding becomes more stereotyped with continued instrumental conditioning

Thorndike and Skinner were partially correct in saying that responding becomes more stereotyped with continued instrumental conditioning, but wrong to suggest inevitability

New cards

Belongingness

The idea, originally proposed by Thorndike, that an organism’s evolutionary history makes certain responses fit or belong with certain reinforcers

Facilitates learning

New cards

Sevenster (1973)

Used the presentation of another male or female as a reinforcer in instrumental conditioning of male sticklebacks

One group of fish was required to bite a rod to obtain access to the reinforcer

When the reinforcer was another male, biting behaviour increased

Presentation of a female was an effective reinforcer for other reponses, such as swimming through a ring

Biting ‘belongs with’ terrotorial defense and can be reinforced by the presentation of a potentially rival male

Biting does not belong with the presentation of a female, which typically elicits courtship rather than aggression

New cards

Breland and Breland (1961)

Set up a business to train animals to perform entertaining response chains for displays in amusement parks and zoos

Observed dramatic behaviour changes inconsistent with the reinforcement procedures they were using

The extra responses that developed in these food reinforcement situations were activities the animals instinctively perform when obtaining food (instinctive drift)

New cards

Instinctive drift

A gradual drift of instrumental behaviour away from the responses required for reinforcement to species-typical responses related to the reinforcer and to other stimuli in the experimental situation

New cards

Behaviour systems theory in instrumental conditioning

When an animal is food deprived and is in a situation where it might encounter food, its feeding system becomes activated, and it begins to engage in foraging and other food-related activities

The effectiveness of the procedure in increasing an instrumental response will depend on the compatibility of that response with the preexisting organisation of the feeding system

The nature of other responses that emerge during training (or instinctive drift) will depend on the behavioural components of the feeding system that become activated by the instrumental conditioning procedure

Diagnosis of whether a response is part of a behaviour system: classical conditioning experiment

A CS elicits components of the behaviour system activated by the US
If instinctive drift reflects responses of the behaviour system, responses akin to instinctive drift should be evident in the CC experiment

Periodic deliveries of food activate the feeding system and its preorganised species-typical foraging and feeding responses

New cards

Shettleworth (1975)

Study of the effects of food deprivation in hamsters

Found that:

Responses that become more likely when the animal is hungry are readily reinforced with food
Responses that become less likely when the animal is hungry are difficult to train as instrumental responses

New cards

Quantity and quality of the reinforcer

If a reinforcer is very small and of poor quality, it will not increase instrumental responding (positive reinforcement)

Longer (larger magnitude) reinforcer much more effective in maintaining instrumental responding`

New cards

Shifts in reinforcer quality/quantity

The effectiveness of a reinforcer depends not only on its own properties but also on how that reinforcer compares with others the individual received in the recent past

New cards

Positive behavioural contrast effect

A large reward is treated as especially good after reinforcement with a small reward

New cards

Negative behavioural contrast effect

A small reward is treated as especially poor after reinforcement with a large reward

New cards

Behavioural contrast

Crespi (1942)

Change in the value of a reinforcer produced by prior experience with a reinforcer of a higher or lower value

Prior experience with a lower valued reinforcer increases reinforcer value

Prior experience with a higher valued reinforcer reduces reinforcer value

Can occur either because of a shift from a prior reward magnitude or because of an anticipated reward (anticipatory contrast effect)

New cards

Ortega and colleagues (2011)

Negative behavioural contrast

Lab rats were given a sucrose solution to drink for 5mins each day

G1: sucrose solution always 4%

G2: sucrose solution much more tasty (32%) on the first 10 trials and was then decreased to 4% for the remaining four trials

During the first 10 trials, spent a bit more time licking 32% than 4%

When 32% changed to 4%, dramatic decrease in licking time

Shifted group licked significantly less of the 4% on trials 11 and 12 than the nonshifted group

New cards

Response-reinforcer relation

Efficient instrumental behaviour requires that you know when you have to do something to obtain a reinforcer and when the reinforcer is likely to be delivered independent of your actions (sensitivity)

Two types (independent of each other):

Temporal relation
- Temporal contiguity
Causal relation/response-reinforcer contingency

New cards

Temporal relation

The time interval between an instrumental response and the reinforcer

Responding decreases fairly rapidly with increases in delay of reinforcement

New cards

Temporal contiguity

The delivery of the reinforcer immediately after the response

New cards

Credit-assignment problem

Responding decreases fairly rapidly with increases in delay of reinforcement

With delayed reinforcement, it is difficult to figure out which response deserves the credit for the delivery of the reinforcer

To associate the response with the reinforcer, the pp has to have some way to distinguish it from the other responses it performs during the delay interval

Methods to overcome:

Provide a secondary/conditioned reinforcer immediately after the instrumental response, even if the primary reinforcer cannot occur until sometime later
Marking procedure

New cards

Secondary/conditioned reinforcer

A stimulus that becomes an effective reinforcer because of its association with a primary or unconditioned reinforcer

A conditioned stimulus that was previously associated with the reinforcer

E.g. Verbal prompts “good”, “keep going”, “that’s the way”

New cards

Marking procedure

A procedure in which the instrumental response is immediately followed by a distinctive event (pp is picked up or a flash of light is presented) that makes the instrumental response more memorable and helps overcome the deleterious effects of delayed reinforcement

Mark the target instrumental response in some way to make it distinguishable from the other activities of the organism

E.g. Introducing a brief light/noise after the target response; picking up the animal and moving it to a holding box for the delay interval

New cards

Williams (1999)

Compared the learning of a lever-press response in 3 groups of rats

For each group, the food reinforcer was delayed 30s after a press of the response lever

No-signal group received procedure without marking stimulus

Showed little responding during the first 3 blocks of 2 trials
Only achieved modest levels of lever pressing after

Marking group: light presented for 5s right after each lever press

Showed much more robust learning

Blocking group: 5s light presented at end of delay interval, just before food delivery

Never learned the lever-press response
The light became associated with the food, and this blocked the conditioning of the instrumental response

New cards

Response-reinforcer contingency

The relation of a response to a reinforcer defined ITO the probability of getting reinforced for making the response as compared to the probability of getting reinforced in the absence of the reponse

The extent to which the instrumental response is necessary and sufficient to produce the reinforcer

A perfect causal relation between the response and then reinforcer is not sufficient to produce vigorous instrumental responding (does not occur if delayed too long)

New cards

Contiguity

The occurrence of two events, such as a response and a reinforcer, at the same time or very close together in time

New cards

Skinner’s superstition experiment

Major issue: role of contiguity vs contingency in instrumental learning

Placed pigeons in separate experimental chambers and set the equipment to deliver a bit of food every 15s irrespective of what the pigeons were doing (not required to perform any action)

The pigeons appeared to be responding as if their behaviour controlled the delivery of the reinforcer

New cards

Superstitious behaviour

Behaviour that increases in frequency because of accidental pairings of the delivery of a reinforcer with occurrences of the behaviour

New cards

Accidental/adventitious reinforcement

An instance in which the delivery of a reinforcer happens to coincide with a particular response, even though that response was not responsible for the reinforcer presentation

The accidental pairing of a response with delivery of the reinforcer

Considered to be responsible for superstitious behaviour

E.g. Whatever response a pigeon happened to make just before it got free food became strengthened and subsequently increased in frequency

New cards

Staddon and Simmelhag (1971)

Repeated Skinner’s superstition experiment

Made more extensive and systematic observations: defined a variety of responses, then recorded the frequency of each response according to when it occurred during the interval between successive free deliveries of food

Terminal & interim responses

Failed to find evidence for accidental reinforcement effects: responses did not always increase in frequency merely because they occurred coincidentally with food delivery

Food delivery appeared to influence only the strength of terminal responses, even in the initial phases of training

Suggested that terminal responses are species-typical responses that reflected the anticipation of food as time draws closer to the next food presentation

Viewed interim responses as reflecting other sources of motivation that are prominent early in the interfood interval, when food presentation is unlikely

New cards

Terminal response

A response that is most likely at the end of the interval between successive reinforcements that are presented at fixed intervals

New cards

Interim response

A response that has its highest probability in the middle of the interval between successive presentations of a reinforcer, when the reinforcer is not likely to occur

New cards

Overmier & Seligman (1967)
Seligman & Maier (1967)

Pioneering studies on the effects of control over aversive stimulation

Investigated the effects of exposure to uncontrollable shock on subsequent escape-avoidance learning in dogs

Exposure to uncontrollable shock disrupted subsequent learning (learned-helplessness effect)

New cards

Triadic design

Design used to conduct learned-helplessness experiments

1) Exposure phase

G1 exposed to periodic shocks that can be terminated by performing an escape response
G2 (yoked group): assigned a partner in G1 and receives the same duration and distribution of shocks (cannot turn off the shocks)
G3 receives no shocks during exposure phase but is restricted to the apparatus for as long as G1 and G2

2) Conditioning phase

All groups receive escape-avoidance training
Conducted in a shuttle apparatus that has two adjacent compartments
Have to go back and forth between the two compartments to avoid shock (or escape any shocks they failed to avoid)

Exposure to uncontrollable shock (G2) produces a severe disruption is subsequent escape-avoidance learning

Little or no deleterious effects are observed after exposure to escapable shock (G1 learn as rapidly as G3)

Primary difference between G1 and G2 is the presence of a response-reinforcer contingency for G1 but not for G2 (animals are sensitive to the response-reinforcer contingency)

New cards

Learned helplessness effect

Interference with the learning of new instrumental responses as a result of exposure to inescapable and unavoidable aversive stimulation

New cards

Learned helplessness hypothesis

The proposal that exposure to inescapable and unavoidable aversive stimulation reduces motivation to respond and disrupts subsequent instrumental conditioning because pps learn that their behaviour does not control outcomes

Learning deficit occurs for two reasons:

The expectation of lack of control reduces the motivation to perform an instrumental response
Even if they make the response and get reinforced in the conditioning phase, the previously learned expectations of lack of control makes it more difficult to learn that their behaviour is now effective in producing reinforcement

New cards

Activity deficit hypothesis

Proposal that animals in G2 show a learning deficit following exposure to inescapable shock because inescapable shocks encourage animals to become inactive/freeze

Cannot explain instances in which exposure to inescapable shock disrupts choice learning

New cards

Attention deficit hypothesis

Proposes that animals in G2 show a learning deficit since exposure to inescapable shock reduces the extent to which animals pay attention to their own behaviour

More successful as an alternative to the learned helplessness hypothesis

New cards

Stimulus relations in escape conditioning

Another line of research challenging helplessness hypothesis

Instead of focusing on why inescapable shock disrupts subsequent learning, asked why exposure to escapable shock is not nearly as bad

Making an escape response results in internal sensations/response feedback cues

Shock-cessation feedback cues: some of the response-produced stimuli are experienced at the start of the escape response, just before the shock is turned off

Safety-signal feedback cues: other response-produced stimuli are experienced as the animal completes the response, just after the shock has been turned off at the start of the intertrial interval

Can become conditioned inhibitors of fear and limit or inhibit fear elicited by contextual cues of the experimental chamber
For G2, contextual cues of the chamber in which shocks are delivered are more likely to become conditioned to elicit fear with inescapable shock

New cards

Jackson and Minor (1988)

One group of rats received the usual inescapable shocks in the exposure phase of the triadic design

At the end of each shock presentation, the houselights were turned off for 5 seconds as a safety signal

The introduction of the safety signal entirely eliminated the disruptive effects of shock exposure on subsequent shuttle-escape learning

Indicates that significant differences in how animals cope with aversive stimulation can result from differences in the ability to predict when shocks will end and when a safe intertrial interval without shocks will begin