Responses followed by a pleasant reward strengthen the stimulus-response (S-R) association.
Responses followed by punishment weaken the S-R association.
Most real-life behaviors are not always or immediately rewarded.
The schedule of reinforcement influences the maintenance and pattern of operant behavior.
Schedule of reinforcement: A program that determines which instrumental responses are followed by a reinforcer.
Ratio: Based on the number of responses.
Fixed Ratio (FR): Reinforcement after every nth response.
Example: FR 9 schedule provides a free drink for every 9 purchased.
Behavior characterized by a steady, high rate of responding with post-reinforcement pauses.
The length of the pause correlates with the number of required responses, indicating procrastination.
Ratio strain: When the requirement is too high causing the subject to stop responding.
Variable Ratio (VR): Reinforcement after every nth response on average.
Example: VR 100 schedule pays out on average every 100 pulls on a slot machine.
Responding occurs at a steady rate without predictable pauses.
Interval: Based on the amount of time passed.
Fixed Interval (FI): Reinforcement after a set amount of time has elapsed since the previous reinforcement.
Responses before this time are not reinforced.
Example: Pressing a crosswalk button reinforces only after a fixed time has passed.
Organisms learn to time responses, showing temporal relation learning.
FI behavior shows a scalloped pattern with more responses as the time for reinforcement nears which can be related to procrastination.
Variable Interval (VI): Based on an average length of time between reinforcement availability.
Example: Mail delivery around 3 pm daily (VI 24hr schedule).
Maintains a steady and stable rate of responding without noticeable pauses.
Every instance of the behavior is rewarded (technically an FR 1 schedule).
Responding occurs at a steady and moderate rate with brief, unpredictable pauses.
Example: Every time you drink tequila, you are punished with a hangover.
Similarities:
Post-reinforcement pause after both FR and FI schedules.
High rate of responding just before reinforcement in both FR and FI schedules.
Steady rate of responding without pauses in both VR and VI schedules.
Differences:
Reynolds (1975) found higher pecking rates in pigeons on VR schedules compared to VI schedules when overall reinforcement was matched.
Ratio schedules reinforce short inter-response times, while interval schedules reinforce initially long inter-response times (“molecular theory”).
Ratio schedules have a direct linear relationship between responses and reinforcement, while interval schedules have an upper limit on reinforcement (“molar theory”).
Limited Hold: A restriction on the availability of reinforcement; the reinforcement is only available for a limited period.
Making one response prevents making another (mutually exclusive choices).
Organisms choose based on the relative rate and value of reinforcement for each option.
In concurrent schedule procedure, subjects has two choices, each reinforced on a different reinforcement schedule that enables continuous measurement of choice with free switching between response alternatives.
Relative Rate of Responding:
B_L = behavior on left key
B_R = behavior on right key BL/(BL+BR)
\frac{BL}{BL + B_R} calculates the proportion of responses on the left key.
Relative Rate of Reinforcement:
r_L = rate of reinforcement on left key
r_R = rate of reinforcement on right key rL/(rL+rR)
\frac{rL}{rL + r_R}
If rates of reinforcement are equal, the relative rate of responding will match.
Herrnstein's investigation showed relative rates of responding match relative rates of reinforcement.
The relative rate of responding on an alternative matched the relative rate of reinforcement on that alternative.
Emphasizes reinforcement relative to other alternatives, not just the response alone.
Example: Teenagers from low SES households may engage in risky behaviors due to higher reinforcement rates compared to enriching upbringings.
Example: Compulsive eating provides immediate reinforcement (CRF) compared to the variable rate of reinforcement for dieting (VI).
Includes parameters for sensitivity (s) and bias (b) to account for mismatches between responding and reinforcement rates.
s = sensitivity: Reduced sensitivity (s < 1) leads to undermatching.
b = response bias or preference (e.g., handedness).
Choice with commitment to a particular reinforcement schedule.
Preference: Ratio schedules OVER interval schedules, and variable schedules OVER fixed schedules.
Involves committing to a behavior with short-term losses but long-term benefits (e.g., exercise, studying).
The value of a reinforcer decreases with waiting time.
Self-control is easier when the tempting alternative is not readily available.
The value of a reinforcer is reduced by how long you have to wait to get it.
Value-discounting function:
V = \frac{M}{1 + KD}
V = value of a reinforcer
M = reward magnitude
D = reward delay (no delay = 0)
K = discounting rate parameter
Reward value initially decreases rapidly. Steeper discounting functions indicate more impulsive behavior.
Madden, Petry, Badger, & Bickel (1997) showed heroin addicts have steeper discounting functions than controls.
Self-control can be trained by shaping (gradually increasing delay), using low-effort tasks, or distraction.
Associative structure of instrumental conditioning (Thorndike).
Response-allocation approach (Skinnerian tradition).
Three-term contingency:
Contextual stimulus (S)
Instrumental response (R)
Reinforcing or response outcome (O)
S-R association (law of effect): The reinforcer strengthens the S-R association.
Motivation to respond is activation of the S-R association via exposure to the stimulus.
Resurgence of interest in S-R association due to habits and automatized behaviors.
S-O association: Learning outcomes associated with signals (Pavlovian conditioning).
Hull (1930) and Spence (1956) proposed that the instrumental response increases because the S evokes the response directly (S-R) and creates an expectancy of reward (S-O).
Two-Process Theory:
Pavlovian and instrumental learning.
During conditioning, the stimulus becomes associated with the outcome (S-O association) which activates an emotional state that motivates behavior.
Pavlovian Instrumental Transfer (PIT) Experiment:
Test if a Pavlovian S-O association motivates instrumental behavior.
Example: Lever pressing for food decreases in the presence of a CS+ for footshock.
Krank et al., 2008 showed lever pressing for alcohol increased when a light CS was presented.
Kruse et al., (1983) showed CS+ for food pellets facilitated instrumental responding reinforced with pellets more than sugar water, and vice versa.
Demonstrates expectancies for specific rewards, not just conditioned emotional states.
Devaluation studies provide evidence.
If an instrumental response is motivated by an R-O association, devaluation of the reinforcer should reduce the rate of the instrumental response.
Hogarth & Chase, 2011 showed evidence in human behavior.
S activates both R and the R-O association, motivating behavior.
Based in Skinnerian tradition and focused on functional aspect of behavior rather than internal associative processes.
Consummatory responses (eating, drinking) are reinforcing because they involve an instinctive behavior sequence.
Instrumental consummatory responses are fundamentally different from other types of instrumental responses.
The opportunity to perform a higher probability response (H) after a lower probability response (L) will reinforce response L.
Any high-probability activity can be an effective reinforcer.
Can explain individual differences in reinforcing properties of activities and can be applied in clinical situations.
Restriction of activity is what makes responses reinforcing.
Low probability responses can be effective reinforcers if restricted.
Instrumental conditioning procedures inherently involve some degree of restriction to the reinforcer which relates to outcome devaluation.
Examines all available response options and how the distribution of responses changes when an instrumental conditioning procedure is introduced.
Uses an unconstrained baseline (behavioral bliss point), which is how the individual allocates responses without restrictions.
Assesses deviation from the unconstrained baseline when restrictions are imposed so subject compromise and redistribute responses to bring them as close as possible to the preferred level of distribution.
Reinforcer effect: An increase in the instrumental response above the level in the absence of the response-reinforcer contingency.
Bliss point approach is similar to economics research, which studies how people change behavior to maximize benefits and minimize costs.
Restrictions are imposed by income and price of goods in economics, and by the number of responses and required to obtain each reinforcer in instrumental conditioning.
Psychologists study behavior regulation in terms of economic concepts, such as supply and demand.
The degree to which price influences consumption.
Factors:
Availability of substitutes
Price range
Income level
Link to complementary commodity
Intermittent reinforcement is administered according to various schedules based on responses or time, either fixed or variable.
Use competing alternatives to assess preferences, especially in choice with commitment procedures.
Preference for rewards decreases with delay.
Motivation can be explained by S-O and S-R associations.
Behavior can also be explained based on what reinforces the behavior and available alternatives.