Schedules and Theories of Reinforcement Notes

Schedules and Theories of Reinforcement

Types of Positive Reinforcement

Positive reinforcement can be further distinguished by:
- Immediate vs. delayed
- Primary vs. secondary
- Intrinsic vs. extrinsic
- Natural vs. contrived

Immediate vs. Delayed Reinforcement

The more immediate the reinforcer, the stronger its effect on the behavior.
Example:
- Giving a treat to a child who is playing quietly while she is still playing quietly reinforces quiet playing.
The benefits of exercise and proper eating are delayed and therefore weak.

Primary Reinforcers

A primary reinforcer is an event that is innately reinforcing.
Examples:
- Food, water, proper temperature, sexual contact.

Secondary Reinforcers

A secondary reinforcer is an event that is reinforcing because it has been associated with some other reinforcer.
Examples:
- Good marks, fine clothes, a nice car.
Conditioned stimuli that have been associated with appetitive unconditioned stimuli (USs) can also function as secondary reinforcers.
Example:
- A metronome that has been associated with food can be a secondary reinforcer for the operant response of lever pressing.

Generalized Reinforcer

A type of secondary reinforcer that has been associated with several other reinforcers.
Example:
- Money is associated with food, clothing, furnishings, entertainment, and even dates.
- Social attention is associated with food, play, and comfort.
A “token economy” is when tokens are given as reinforcers. The tokens can be traded for other rewards later.

Intrinsic Reinforcement

Reinforcement provided by the mere act of performing the behavior.
Examples:
- Rollerblading is invigorating.
- Attending parties is fun.

Extrinsic Reinforcement

The reinforcement provided by some consequence that is external to the behavior.
Examples:
- You drive to get somewhere.
- You work for money.
- You date an attractive individual merely to enhance your prestige.

Natural Reinforcers

Reinforcers that are naturally provided for a certain behavior.
They are a typical consequence of the behavior within that setting.
Examples:
- Money is a natural consequence of selling merchandise.
- Gold medals are a natural consequence of hard training and a great performance.

Contrived Reinforcers

Reinforcers that have been deliberately arranged to modify a behavior.
They are not a typical consequence of the behavior in that setting.
Examples:
- Turning on the television is a contrived reinforcer for studying.

Continuous Reinforcement Schedule

Abbreviated as CRF
Each instance of the specified behavior is reinforced.
Particularly useful in teaching new behavior.
Speeds up conditioning.
Used frequently in shaping procedures.
Produces responding that has a low resistance to extinction but a high extinction burst.

Intermittent Reinforcement Schedule

Only some responses are reinforced.
Particularly useful in maintaining new behavior.
Used frequently in maintenance procedures.
Produces responding that has a high resistance to extinction but a low extinction burst.

4 Basic Intermittent Schedules

Fixed ratio
Variable ratio
Fixed interval
Variable interval

Fixed Ratio Schedules

Fixed Ratio: reinforcement is contingent on a fixed number of responses.
Abbreviated FR, e.g., FR5, FR500
Produce a high rate of responses and a short post-reinforcement pause.
See figure 7.1
Ratio strain: a disruption in performance due to an overwhelming response requirement.

Responses to FR Schedules

Usually a high rate of response with a short postreinforcement pause.
Example:
- On an FR 25 schedule, a rat rapidly emits 25 lever presses, munches down the food pellet it receives, and then snoops around before emitting more lever presses.
A postreinforcement pause is a short pause following the attainment of each reinforcer.
Higher ratio requirements produce longer postreinforcement pauses.
Example:
- You will take a longer break after completing a long assignment than after completing a short one.
There may be little or no pausing with an FR1 or FR2 schedule because the reinforcer is so close.

Variable Ratio Schedules

Variable Ratio: reinforcement is contingent on a variable number of responses.
Abbreviated VR, e.g., VR50, VR500
Produce a high rate of responses and almost NO post-reinforcement pause.
See figure 7.1
Ratio strain: a disruption in performance due to an overwhelming response requirement.

Real Examples of VR Schedules

Only some of a cheetah’s attempts at chasing down prey are successful.
Only some acts of politeness receive acknowledgment.
Only some songs that we download are enjoyable.

Fixed Interval Schedules

Fixed Interval: reinforcement is contingent on the first response after a fixed period of time.
Abbreviated FI20. Note that here the number refers to the duration of the interval and not the number of responses.
Produce scalloped performance - timing.
See figure 7.1

Responses to FI Schedules

Responses consist of a postreinforcement pause followed by a gradually increasing rate of response as the interval draws to a close.
Example:
- A rat on an FI 30-sec schedule will emit no lever presses at the start of the 30-second interval, but the rate of responses will gradually increase as the 30 seconds end.
Your study habits this term were probably very weak at the beginning of the semester, and they will increase as the semester draws to a close.

Variable Interval Schedules

Variable Interval: reinforcement is contingent on the first response after a variable period of time.
Abbreviated VI (e.g., VI 20). Note that here the number refers to the duration of the interval and not the number of responses.
Produce a moderate, steady rate of performance with little or no post-reinforcement pause - timing.
See figure 7.1 and table 7.1

Response Rate Schedules

Differential Reinforcement of High Rates (DRH). Reinforcement is contingent upon emitting at least a certain number of responses in a given time.
Example: a salesman has to make at least 15 sales per week to earn his bonus.
Differential Reinforcement of Low Rates (DRL). A minimum amount of time must pass between responses in order for reinforcement to be delivered.
Example: a teacher will only call on her student if there's an interval of at least 5 minutes between the times he raises his hand.

Noncontingent Schedules

Fixed Time (FT). Reinforcement is delivered after a fixed interval of time, regardless of the organism’s behavior.
Example:
- On a fixed time 30-second (FT 30-sec) schedule, a pigeon receives food every 30 seconds regardless of its behavior.
- People receive Christmas gifts each year regardless of their behavior on an FT 1-year schedule.
Variable Time (VT). Reinforcement is delivered after a variable interval of time, regardless of the organism's behavior.
Example:
- A pigeon receives food after an average interval of 30 seconds (VT 30-sec schedule).

Superstitious Behavior

Skinner: Superstition in the Pigeon experiments (1957).
Noncontingent reinforcement may account for some forms of superstitious behavior.
Behaviors may be accidentally reinforced by the coincidental presentation of reinforcements.
This is also true of athletes and gamblers.
Unusual events that precede a fine performance may be quickly identified and then deliberately reproduced in the hopes of reproducing that performance.
Superstitious behavior can be seen as an attempt to make an unpredictable situation more predictable.

Complex Schedules of Reinforcement

A combination of two or more simple schedules.
They include:
- Conjunctive Schedules
- Adjusting Schedules
- Chained Schedules
- Multiple Schedules
- Concurrent Schedules

Complex Schedules

Conjunctive: Requirements of two or more schedules must be met before reinforcement is delivered.
Get paid for every item you make. Stay at work for 8 hours actually working. FR + FI
Most real-life schedules are conjunctive.
Adjusting: Response requirement changes as a function of the organism's behavior while responding on the preceding schedule (working for a previous reinforcer).
Example: May start with FR5, then move up to FR 50, etc.
Chained: Two or more simple schedules where each link has its own SD/Sr+ and the last response produces a terminal reinforcer that reinforces the entire chain.
Green key, peck --> red key, peck --> food.
Goal gradient effect: increase in the strength and / or efficiency of behavior as a function of the approaching reinforcement. That's why you are more likely to become distracted when you're starting your paper. Secondary reinforcement is likely needed.
Examples: working more efficiently near pay time, taking shorter breaks when you're almost done with an essay.

Theories of Reinforcement

Highly influential theory to this day, developed by David Premack (1965):
Premack Principle: behaviors associated with experience with reinforcement are viewed as reinforcers, rather than stimuli.
According to the Premack principle, behavior is reinforced by eating the food, drinking the water, riding a bike, not by food, water, or bike.
High probability behaviors can be used as reinforcers for low probability behaviors (e.g., running the wheel --> drinking water).

Response Deprivation Hypothesis (Timberlake & Allison, 1974)

Extension of the Premack Theory.
Behavior can serve as a reinforcer if:
- Access to the behavior is restricted.
- Its frequency falls below the preferred levels of occurrence.
Example:
- A rat typically runs for 1 hour a day whenever it has free access to a running wheel.
- If the rat is then allowed free access to the wheel for only 15 minutes per day, it will be unable to reach this preferred level.
- The rat will be in a state of deprivation with regard to running.
- The rat will now be willing to work to obtain additional time on the wheel.