Catania 15: Reinforcement Schedules and Contingencies

Motivating Operations and Contingencies

Aversive stimuli, such as shock, are motivating events in escape or avoidance situations because there is no need to escape or avoid them unless they are actually or potentially present. This principle can be extended to other aversive stimuli like cold, bee stings, or loud noises.

Contingencies: Concerned with the consequences of responding.
Motivating Events: Determine whether the consequences are important enough to serve as reinforcers.

Reinforcement Schedules

Reinforcement schedules are arrangements that specify which responses within an operant class will be reinforced. Intermittent or partial reinforcement, where some responses are reinforced but not others, is a common feature of behavior.

Types of Schedules

Ratio Schedules: Allow reinforcement after some number of responses.
Interval Schedules: Allow reinforcement after some time has elapsed since an event.
Differential-Reinforcement Schedules: Allow reinforcement depending on the rate or timing of prior responses.

These requirements can be combined to create more complex schedules.

Variable-Ratio (VR) and Variable-Interval (VI) Schedules

Variable-Interval (VI) Schedules

VI schedules reinforce a single response after a specified time has elapsed, with the time varying from one instance to the next. Earlier responses have no effect. A VI schedule is designated by the average time to the availability of a reinforcer.

Example: Phoning a cousin whose voicemail is inactive; getting an answer depends on calling when the cousin is available, regardless of how many times you call.

Variable-Ratio (VR) Schedules

VR schedules depend on the number of times a response is made, and this number varies from one occasion to another. They are designated by the average number of responses required per reinforcer or the average ratio of responses to reinforcers.

Example: Asking passersby for change from a vending machine; getting change depends on the number of people asked, not when you ask them.

Applying Reinforcement Schedules

Applying reinforcement schedules outside the laboratory requires specifying the responses and reinforcers. For example, consider phoning to get pledges for a charity. Whether a call is answered depends on when you make the call (interval schedule), but the number of pledges depends on the number of calls you make (ratio schedule).

Properties of VR and VI Schedules

VR Schedule: The delivery of a reinforcer depends on a variable number of responses without regard to the passage of time.
VI Schedule: The delivery of a reinforcer depends on the passage of a variable time, then a single response; earlier responses do nothing.

VR Schedule Details

VR schedules are often arranged by a computer that randomly selects responses for reinforcement; such a schedule is sometimes called a random-ratio or RR schedule. In a VR 100 schedule, one response is reinforced per 100 responses on average, but the number varies.

In VR schedules, higher response rates produce higher reinforcement rates.

With moderate ratios, VR schedules generate high and roughly constant response rates between reinforcers. When the ratio becomes very large, response rate decreases due to pauses, known as ratio strain.

VI Schedule Details

VI schedules were arranged by a loop of tape driven by a motor at constant speed instead of by responses. Whenever a switch sensed a hole, the next response was reinforced. An alternative method is generating pulses at a fixed rate and randomly selecting some proportion of them to set up a reinforcer for the next response; these are sometimes called random-interval or RI schedules.

VI schedules provide a relatively constant reinforcement rate over a substantial range of possible response rates. VI reinforcement is a preferred baseline schedule because of this property, making it useful for studying the effects of other variables such as drugs or chemical pollutants.

Response Rates and Reinforcement

VR response rates are higher than VI response rates over most values. With pigeons, VR rates often exceed 200 responses/min, whereas VI rates rarely exceed 100 responses/min.
The shapes of the functions can be affected by whether the organism receives all of its food within experimental sessions (closed economies) or receives some outside the sessions (open economies).

Extinction

During extinction, VR responding usually produces abrupt transitions from high response rates to periods of no responding (a break-and-run pattern). In contrast, extinction after VI usually produces gradual decreases in the rate of responding.

A moderate-rate VI performance might be considerably stronger than a high-rate VR performance regarding resistance to change.

Limited Hold

A limited hold (LH) is a temporal contingency where a setup or scheduled reinforcer remains available only for a limited time. If no response occurs within that time, the reinforcer is lost.

Example: Trying to call a colleague who is often on the phone, where the limited hold varies in duration. In the laboratory, it is usually constant. A limited hold typically produces increased response rates, but a very short limited hold may not maintain responding.

Reinforcement Schedules and Causation

The effects of reinforcers depend on the responses they follow, whether the reinforcers are produced by the responses or delivered independently of them.

Causal Relations: Causal relations between responses and reinforcers may affect behavior differently than coincidental temporal contiguities.
Variability: Accidental correlations between behavior and environmental events are variable. In contrast, behavior that is instrumental must have at least one aspect that has a more or less fixed correlation with the reinforcer.

Experiment: Pigeons' key pecks were reinforced according to a VI schedule. When response-produced reinforcers decreased, response rate decreased. When food was completely independent of behavior, response rates approached zero.

$Lattal's$ data suggests that variability counteracts the effects of accidental contiguities.

Delay of Reinforcement Example

VI: The interval ends at the dashed line, and the next response, $a$ , is followed immediately by a reinforcer.

VI with Delay: The interval ends at the dashed line, and response $b$ produces a reinforcer $3 s$ later. The time from the last response to the reinforcer (c) is shorter than the scheduled delay (d).

VT Schedule: At the dashed line, the reinforcer is delivered independently of responses, so the time between the last response and the reinforcer (e) varies.

Rates of pecking were highest with VI reinforcement and lowest with VT reinforcement. These effects depend on how correlations among events are integrated over time.

Fixed-Ratio (FR) and Fixed-Interval (FI) Schedules

In fixed schedules, the number of responses per reinforcer or time to the availability of a reinforcer is constant from one reinforcer to the next. This introduces discriminable periods during which no reinforcers occur.

Fixed-Ratio (FR) Schedules

In an FR schedule, the last of a fixed number of responses is reinforced. The count doesn't start over if the FR responding is interrupted. FR responding typically consists of a pause followed by a high response rate. The pause is called a postreinforcement pause (PRP) or, more appropriately, a preresponding pause.

As FR size increases, the average duration of the pause increases.

Fixed-Interval (FI) Schedules

A response is reinforced only after some constant interval has passed since some environmental event. Responses before this interval ends have no effect.

Example: Looking at a clock as time passes during a lecture. Looking at the clock before then doesn't make it run any faster.

Responding maintained by FI schedules usually occurs at zero or low rates early in the interval and increases as the end of the interval approaches. The concave-upward pattern of such records is called FI scalloping. The pattern of FI responding tends to be consistent over relative rather than absolute time in the interval.

Delay of Reinforcement and Partial Reinforcement

The reinforcer produced by the last of a sequence of responses has effects that depend on its relation to all of the preceding responses, not just the one that produced it.

Looking at schedules in terms of the delayed reinforcement of all the responses that precede the reinforced response suggests that intermittent or partial reinforcement works as it does because it allows each reinforcer to reinforce many responses instead of just one.

Delay Gradient

The earlier responses in a sequence that ends with a reinforcer contribute less to future responding than the later ones because of the longer delays that separate them from the reinforcer (Dews, 1962). This means that in interpreting effects of schedules, we need to know the form of the delay gradient.

Demonstration of Delay-of-Reinforcement Gradient: A VI schedule is set up on two pigeon keys. The reinforcer is always produced by pecks on the right key, but only if preceded by a particular sequence on both keys. The left pecks can be maintained only by the reinforcer produced by the last right peck, but they are always separated from that reinforced right peck by a delay determined by the required sequence. The delay can then be varied by changing the number of pecks on the right key while holding the number on the left key constant.

Practical Implications

Teachers must be alert for sequences in which a student's errors are followed by corrections, so that they don't strengthen incorrect responses along with the correct ones that they reinforce. Examples like these should remind us that shaping is often more art than science.

Differential Reinforcement of Low Rate (DRL)

A DRL schedule arranges a reinforcer for a response that is preceded by some minimum time without responding; that time is usually constant. For example, a DRL 20-s schedule will reinforce any response preceded by at least 20 s of no responding.

In DRL performance, responding is unlikely to extinguish because decreasing low rates produce even more reinforcers. Typically, responding stabilizes at some value, oscillating between increased rates accompanied by decreased reinforcement and decreased rates accompanied by increased reinforcement.

Malott & Cumming, 1964: The data fall roughly on a straight line in these coordinates, which means the function approximates a power function. It is well fit by the equation $T = 1.3t^{0.9}$ , where $T$ is the pigeon's modal IRT and $t$ is the IRT required by the schedule.

Reinforcement Schedules: A Taxonomy

Some parts of the vocabulary of schedules are logical, but others are admittedly idiosyncratic. Table 15-1 summarizes some major schedules. The definitions apply whether reinforcers are arranged successively and without interruption or occur within separate trials.

The vocabulary of this table, presented in terms of reinforcement schedules, can also be extended to punishment schedules. The symmetry of reinforcement and punishment, illustrated in Chapter 7, applies also to scheduling effects.

Yoked Schedules

The yoked-chamber procedure (Ferster & Skinner, 1957) lets us study some variables that operate within schedules. In yoked chambers, an organism's performance in one chamber determines the events that occur in another organism's chamber.

The yoking experiment shows that the rate difference between VR and VI schedules can't be attributed to responses per reinforcer or time per reinforcer because the rate difference remains even when these are the same in both schedules.

Interresponse Times and Delays

One suggestion was the differential reinforcement of interresponse times (Anger, 1956). Figure 15-13 illustrates how it might work. On the left are shown some possible VI sequences ending in a reinforced response after 5 s and some possible VR sequences ending with a reinforcer after the last of 5 responses.

Comparing the two schedules, when IRTS are short reinforcement probability is higher for VR than for VI schedules, but when they are long it is higher for VI than for VR schedules. The higher reinforcement probability for long than for short IRTS in VI should result in the differential reinforcement of long IRTS; an increase in those longer IRTS will mean a decrease in response rate. If that happens only in VI schedules, VI rates should generally be lower than VR rates.