1/62
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Instrumental behaviour
Behaviour that occurs because it was previously effective in producing certain consequences
Thorndike’s law of effect
A mechanism of instrumental behaviour, proposed by Thorndike, which states that if a response R is followed by a satisfying event in the presence of a stimulus S, the association between the stimulus and response (S-R) will be strengthened
If the response is followed by an annoying event, the S-R association will be weakened
Impt: Involves S-R learning — what is learned is an association between the response and the stimuli present at the time of the response.
The consequence of the response is not one of the elements in the association
The satisfying/annoying consequence simply serves to strengthen or weaken the association between the preceding stimulus and response
Attractive mechanism to explain compulsive habits that are hard to break, e.g.
S: Sight and smell of popcorn
R: Compels you to grab more popcorn and eat it
A habitual smoker who knows that smoking is harmful will continue to smoke because S-R mechanisms compel lighting a cigarette independent of the consequences of the response
Approaches to the study of instrumental conditioning
Discrete-trial procedures
Free-operant procedures
Discrete-trial procedures
W.S. Small
A method of instrumental conditioning in which the pp can perform the instrumental response only during specified periods, usually determined either by placement of the pp in an experimental chamber or by the presentation of a stimulus
Usually involve the use of some type of maze
Runway/straight-alley maze
T maze
Animal has limited opportunities to respond, and these opportunities are scheduled by the experimenter
Runway/straight-alley maze
Contains a start box at one end and a goal box at the other
Rat is placed in the start box
Barrier separating the start box from the main section of the runway is raised
Rat is allowed to make its way down the runway until it reaches the goal box, which usually contains a reinforcer
Behaviour quantified by:
Running speed
Latency
Running speed
Used to quantify behaviour of rats in a runway maze
How fast the animal gets from the start box to the goal box
Typically increases with repeated trials
Latency
The time between the start of a trial/stimulus and the instrumental response
Typically decreases with repeated trials
T maze
Consists of a start box and alleys arranged in the shape of a T
A goal box is located at the end of each arm of the T
Because it has two choice arms, it can be used to study more complex questions
Free-operant procedures
B. F. Skinner (1938): to study behaviour in a more continuous manner
A method of instrumental conditioning that permits repeated performance of the instrumental response without intervention by the experimenter
Concept of the operant as a way of dividing behaviour into meaningful measurable units
Skinner box
Magazine training & shaping
Skinner box
A small chamber that contains a lever that the rat can push down repeatedly
Has a mechanism that can deliver a reinforcer (e.g. food/water) into a cup
The lever is electronically connected to the food-delivery system so that when the rat presses the lever, a pellet of food automatically falls into the food cup
Operant response
A response that is defined by the effect it produces in the environment
E.g. Any sequence of movements that depress the lever/opens the door constitutes an instance of that particular operant
Operational outcome is the critical measure of success
Instrumental response
Any response that is required to produce a desired consequence
Magazine training and shaping
The shaping of a new operant response requires approximations to the final behaviour
Successful shaping involves three components:
Clearly define the final response you want the trainee to perform
Clearly assess the starting level of performance, no matter how far it is from the final response you are interested in
Divide the progression from the starting point to the final target behaviour into appropriate training steps
The successive approximations make up your training plan
Execution of training plan involves two complementary tactics:
Reinforcement of successive approximations to the final behaviour
Withholding reinforcement for earlier response forms
Magazine training
A preliminary stage of instrumental conditioning in which a stimulus is repeatedly paired with the reinforcer to enable the pp to learn to go and get the reinforcer when it is presented
The sound of the food-delivery device, for example, may be repeatedly paired with food so that the animal will learn to go to the food cup when food is delivered
Involves classical conditioning
Sound of food-delivery device (food magazine) repeatedly paired with release of food pellet into cup
Sound elicits a classically conditioned approach response: animal goes to food cup and picks up pellet
After this phase, the rat is ready to learn the required operant response
Response shaping
Reinforcement of successive approximations to a desired instrumental response, e.g.
Food is given if the rat does anything remotely related to pressing the lever
E.g. Rearing response: gets up on its hind legs anywhere in the experimental chamber
Food pellet may be given only if the rat makes the rearing response over the response lever (rearing in other parts no longer reinforced)
Food pellet may be given only if the rat touches and depresses the lever
Shaping new behaviour
Not teaching new response components but how to combine familiar responses into a new activity
Construction/synthesis of a new behavioural unit from preexisting response components that already occur in the organism’s repertoire
Possible to produce responses unlike anything the trainee ever did before
E.g. Expert performances involve novel response forms
Shaping process takes advantage of the variability of behaviour to gradually move the distribution of responses away from the trainee’s starting point and toward responses that are entirely new in the trainee’s repertoire
Rate of occurrence
Skinner proposed to be used as a measure of response probability: now the primary measure in free-operant studies
With continuous opportunity to respond, the organism determines the frequency of its instrumental response → opportunity to observe changes in the likelihood of behaviour over time
Highly likely responses occur often and have a high rate
Unlikely responses occur seldom and have a low rate
Instrumental conditioning procedures
Positive reinforcement
Punishment
Negative reinforcement
Omission Training/Negative Punishment
Appetitive stimulus
A pleasant or satisfying stimulus that can be used to positively reinforce an instrumental response
Aversive stimulus
An unpleasant or annoying stimulus that can be used to punish an instrumental response
Positive reinforcement
Instrumental response produces an appetitive stimulus
Positive contingency between the instrumental response and appetitive stimulus
Produces an increase in the rate of responding
Punishment/Positive punishment
Instrumental response produces an aversive stimulus
Positive contingency between the instrumental response and aversive stimulus
Produces a decrease in the rate of responding
Negative reinforcement
Instrumental response turns off an aversive stimulus
Negative contingency between the instrumental response and aversive stimulus
Produces an increase in the rate of responding
Omission training/negative punishment
Instrumental response results in the removal of an appetitive stimulus
Negative contingency between the response and an environmental event
Produces a decrease in the rate of responding
Often preferred as a method of discouraging human behaviour because it does not involve delivering an aversive stimulus
Differential reinforcement of other behaviour (DRO)
An instrumental conditioning procedure in which a positive reinforcer is periodically delivered only if the pp does something other than the target response
Another word for omission-training procedures
Involves the reinforcement of other behaviour
Fundamental elements of instrumental conditioning
Instrumental response
Behavioural variability vs stereotypy
Relevance/belongingness
Behaviour systems and constraints
Instrumental reinforcer
Response-reinforcer relation (contingency)
Avoidance
An instrumental conditioning procedure in which the instrumental response prevents the delivery of an aversive stimulus
Behavioural variability/stereotypy
Novel response forms can be readily produced by instrumental conditioning if response variation is a requirement for reinforcement
In the absence of explicit reinforcement of variability, responding becomes more stereotyped with continued instrumental conditioning
Thorndike and Skinner were partially correct in saying that responding becomes more stereotyped with continued instrumental conditioning, but wrong to suggest inevitability
Belongingness
The idea, originally proposed by Thorndike, that an organism’s evolutionary history makes certain responses fit or belong with certain reinforcers
Facilitates learning
Sevenster (1973)
Used the presentation of another male or female as a reinforcer in instrumental conditioning of male sticklebacks
One group of fish was required to bite a rod to obtain access to the reinforcer
When the reinforcer was another male, biting behaviour increased
Presentation of a female was an effective reinforcer for other reponses, such as swimming through a ring
Biting ‘belongs with’ terrotorial defense and can be reinforced by the presentation of a potentially rival male
Biting does not belong with the presentation of a female, which typically elicits courtship rather than aggression
Breland and Breland (1961)
Set up a business to train animals to perform entertaining response chains for displays in amusement parks and zoos
Observed dramatic behaviour changes inconsistent with the reinforcement procedures they were using
The extra responses that developed in these food reinforcement situations were activities the animals instinctively perform when obtaining food (instinctive drift)
Instinctive drift
A gradual drift of instrumental behaviour away from the responses required for reinforcement to species-typical responses related to the reinforcer and to other stimuli in the experimental situation
Behaviour systems theory in instrumental conditioning
When an animal is food deprived and is in a situation where it might encounter food, its feeding system becomes activated, and it begins to engage in foraging and other food-related activities
The effectiveness of the procedure in increasing an instrumental response will depend on the compatibility of that response with the preexisting organisation of the feeding system
The nature of other responses that emerge during training (or instinctive drift) will depend on the behavioural components of the feeding system that become activated by the instrumental conditioning procedure
Diagnosis of whether a response is part of a behaviour system: classical conditioning experiment
A CS elicits components of the behaviour system activated by the US
If instinctive drift reflects responses of the behaviour system, responses akin to instinctive drift should be evident in the CC experiment
Periodic deliveries of food activate the feeding system and its preorganised species-typical foraging and feeding responses
Shettleworth (1975)
Study of the effects of food deprivation in hamsters
Found that:
Responses that become more likely when the animal is hungry are readily reinforced with food
Responses that become less likely when the animal is hungry are difficult to train as instrumental responses
Quantity and quality of the reinforcer
If a reinforcer is very small and of poor quality, it will not increase instrumental responding (positive reinforcement)
Longer (larger magnitude) reinforcer much more effective in maintaining instrumental responding`
Shifts in reinforcer quality/quantity
The effectiveness of a reinforcer depends not only on its own properties but also on how that reinforcer compares with others the individual received in the recent past
Positive behavioural contrast effect
A large reward is treated as especially good after reinforcement with a small reward
Negative behavioural contrast effect
A small reward is treated as especially poor after reinforcement with a large reward
Behavioural contrast
Crespi (1942)
Change in the value of a reinforcer produced by prior experience with a reinforcer of a higher or lower value
Prior experience with a lower valued reinforcer increases reinforcer value
Prior experience with a higher valued reinforcer reduces reinforcer value
Can occur either because of a shift from a prior reward magnitude or because of an anticipated reward (anticipatory contrast effect)
Ortega and colleagues (2011)
Negative behavioural contrast
Lab rats were given a sucrose solution to drink for 5mins each day
G1: sucrose solution always 4%
G2: sucrose solution much more tasty (32%) on the first 10 trials and was then decreased to 4% for the remaining four trials
During the first 10 trials, spent a bit more time licking 32% than 4%
When 32% changed to 4%, dramatic decrease in licking time
Shifted group licked significantly less of the 4% on trials 11 and 12 than the nonshifted group
Response-reinforcer relation
Efficient instrumental behaviour requires that you know when you have to do something to obtain a reinforcer and when the reinforcer is likely to be delivered independent of your actions (sensitivity)
Two types (independent of each other):
Temporal relation
Temporal contiguity
Causal relation/response-reinforcer contingency
Temporal relation
The time interval between an instrumental response and the reinforcer
Responding decreases fairly rapidly with increases in delay of reinforcement
Temporal contiguity
The delivery of the reinforcer immediately after the response
Credit-assignment problem
Responding decreases fairly rapidly with increases in delay of reinforcement
With delayed reinforcement, it is difficult to figure out which response deserves the credit for the delivery of the reinforcer
To associate the response with the reinforcer, the pp has to have some way to distinguish it from the other responses it performs during the delay interval
Methods to overcome:
Provide a secondary/conditioned reinforcer immediately after the instrumental response, even if the primary reinforcer cannot occur until sometime later
Marking procedure
Secondary/conditioned reinforcer
A stimulus that becomes an effective reinforcer because of its association with a primary or unconditioned reinforcer
A conditioned stimulus that was previously associated with the reinforcer
E.g. Verbal prompts “good”, “keep going”, “that’s the way”
Marking procedure
A procedure in which the instrumental response is immediately followed by a distinctive event (pp is picked up or a flash of light is presented) that makes the instrumental response more memorable and helps overcome the deleterious effects of delayed reinforcement
Mark the target instrumental response in some way to make it distinguishable from the other activities of the organism
E.g. Introducing a brief light/noise after the target response; picking up the animal and moving it to a holding box for the delay interval
Williams (1999)
Compared the learning of a lever-press response in 3 groups of rats
For each group, the food reinforcer was delayed 30s after a press of the response lever
No-signal group received procedure without marking stimulus
Showed little responding during the first 3 blocks of 2 trials
Only achieved modest levels of lever pressing after
Marking group: light presented for 5s right after each lever press
Showed much more robust learning
Blocking group: 5s light presented at end of delay interval, just before food delivery
Never learned the lever-press response
The light became associated with the food, and this blocked the conditioning of the instrumental response
Response-reinforcer contingency
The relation of a response to a reinforcer defined ITO the probability of getting reinforced for making the response as compared to the probability of getting reinforced in the absence of the reponse
The extent to which the instrumental response is necessary and sufficient to produce the reinforcer
A perfect causal relation between the response and then reinforcer is not sufficient to produce vigorous instrumental responding (does not occur if delayed too long)
Contiguity
The occurrence of two events, such as a response and a reinforcer, at the same time or very close together in time
Skinner’s superstition experiment
Major issue: role of contiguity vs contingency in instrumental learning
Placed pigeons in separate experimental chambers and set the equipment to deliver a bit of food every 15s irrespective of what the pigeons were doing (not required to perform any action)
The pigeons appeared to be responding as if their behaviour controlled the delivery of the reinforcer
Superstitious behaviour
Behaviour that increases in frequency because of accidental pairings of the delivery of a reinforcer with occurrences of the behaviour
Accidental/adventitious reinforcement
An instance in which the delivery of a reinforcer happens to coincide with a particular response, even though that response was not responsible for the reinforcer presentation
The accidental pairing of a response with delivery of the reinforcer
Considered to be responsible for superstitious behaviour
E.g. Whatever response a pigeon happened to make just before it got free food became strengthened and subsequently increased in frequency
Staddon and Simmelhag (1971)
Repeated Skinner’s superstition experiment
Made more extensive and systematic observations: defined a variety of responses, then recorded the frequency of each response according to when it occurred during the interval between successive free deliveries of food
Terminal & interim responses
Failed to find evidence for accidental reinforcement effects: responses did not always increase in frequency merely because they occurred coincidentally with food delivery
Food delivery appeared to influence only the strength of terminal responses, even in the initial phases of training
Suggested that terminal responses are species-typical responses that reflected the anticipation of food as time draws closer to the next food presentation
Viewed interim responses as reflecting other sources of motivation that are prominent early in the interfood interval, when food presentation is unlikely
Terminal response
A response that is most likely at the end of the interval between successive reinforcements that are presented at fixed intervals
Interim response
A response that has its highest probability in the middle of the interval between successive presentations of a reinforcer, when the reinforcer is not likely to occur
Overmier & Seligman (1967)
Seligman & Maier (1967)
Pioneering studies on the effects of control over aversive stimulation
Investigated the effects of exposure to uncontrollable shock on subsequent escape-avoidance learning in dogs
Exposure to uncontrollable shock disrupted subsequent learning (learned-helplessness effect)
Triadic design
Design used to conduct learned-helplessness experiments
1) Exposure phase
G1 exposed to periodic shocks that can be terminated by performing an escape response
G2 (yoked group): assigned a partner in G1 and receives the same duration and distribution of shocks (cannot turn off the shocks)
G3 receives no shocks during exposure phase but is restricted to the apparatus for as long as G1 and G2
2) Conditioning phase
All groups receive escape-avoidance training
Conducted in a shuttle apparatus that has two adjacent compartments
Have to go back and forth between the two compartments to avoid shock (or escape any shocks they failed to avoid)
Exposure to uncontrollable shock (G2) produces a severe disruption is subsequent escape-avoidance learning
Little or no deleterious effects are observed after exposure to escapable shock (G1 learn as rapidly as G3)
Primary difference between G1 and G2 is the presence of a response-reinforcer contingency for G1 but not for G2 (animals are sensitive to the response-reinforcer contingency)
Learned helplessness effect
Interference with the learning of new instrumental responses as a result of exposure to inescapable and unavoidable aversive stimulation
Learned helplessness hypothesis
The proposal that exposure to inescapable and unavoidable aversive stimulation reduces motivation to respond and disrupts subsequent instrumental conditioning because pps learn that their behaviour does not control outcomes
Learning deficit occurs for two reasons:
The expectation of lack of control reduces the motivation to perform an instrumental response
Even if they make the response and get reinforced in the conditioning phase, the previously learned expectations of lack of control makes it more difficult to learn that their behaviour is now effective in producing reinforcement
Activity deficit hypothesis
Proposal that animals in G2 show a learning deficit following exposure to inescapable shock because inescapable shocks encourage animals to become inactive/freeze
Cannot explain instances in which exposure to inescapable shock disrupts choice learning
Attention deficit hypothesis
Proposes that animals in G2 show a learning deficit since exposure to inescapable shock reduces the extent to which animals pay attention to their own behaviour
More successful as an alternative to the learned helplessness hypothesis
Stimulus relations in escape conditioning
Another line of research challenging helplessness hypothesis
Instead of focusing on why inescapable shock disrupts subsequent learning, asked why exposure to escapable shock is not nearly as bad
Making an escape response results in internal sensations/response feedback cues
Shock-cessation feedback cues: some of the response-produced stimuli are experienced at the start of the escape response, just before the shock is turned off
Safety-signal feedback cues: other response-produced stimuli are experienced as the animal completes the response, just after the shock has been turned off at the start of the intertrial interval
Can become conditioned inhibitors of fear and limit or inhibit fear elicited by contextual cues of the experimental chamber
For G2, contextual cues of the chamber in which shocks are delivered are more likely to become conditioned to elicit fear with inescapable shock
Jackson and Minor (1988)
One group of rats received the usual inescapable shocks in the exposure phase of the triadic design
At the end of each shock presentation, the houselights were turned off for 5 seconds as a safety signal
The introduction of the safety signal entirely eliminated the disruptive effects of shock exposure on subsequent shuttle-escape learning
Indicates that significant differences in how animals cope with aversive stimulation can result from differences in the ability to predict when shocks will end and when a safe intertrial interval without shocks will begin