SR

Week 6 Operant Conditioning II Notes

Thorndike’s Law of Effect

  • Responses followed by a pleasant reward strengthen the stimulus-response (S-R) association.

  • Responses followed by punishment weaken the S-R association.

Partial Reinforcement

  • Most real-life behaviors are not always or immediately rewarded.

  • The schedule of reinforcement influences the maintenance and pattern of operant behavior.

  • Schedule of reinforcement: A program that determines which instrumental responses are followed by a reinforcer.

Simple Schedules of Reinforcement

  • Ratio: Based on the number of responses.

    • Fixed Ratio (FR): Reinforcement after every nth response.

    • Example: FR 9 schedule provides a free drink for every 9 purchased.

    • Behavior characterized by a steady, high rate of responding with post-reinforcement pauses.

    • The length of the pause correlates with the number of required responses, indicating procrastination.

    • Ratio strain: When the requirement is too high causing the subject to stop responding.

    • Variable Ratio (VR): Reinforcement after every nth response on average.

    • Example: VR 100 schedule pays out on average every 100 pulls on a slot machine.

    • Responding occurs at a steady rate without predictable pauses.

  • Interval: Based on the amount of time passed.

    • Fixed Interval (FI): Reinforcement after a set amount of time has elapsed since the previous reinforcement.

    • Responses before this time are not reinforced.

    • Example: Pressing a crosswalk button reinforces only after a fixed time has passed.

    • Organisms learn to time responses, showing temporal relation learning.

    • FI behavior shows a scalloped pattern with more responses as the time for reinforcement nears which can be related to procrastination.

    • Variable Interval (VI): Based on an average length of time between reinforcement availability.

    • Example: Mail delivery around 3 pm daily (VI 24hr schedule).

    • Maintains a steady and stable rate of responding without noticeable pauses.

Continuous Reinforcement (CRF)

  • Every instance of the behavior is rewarded (technically an FR 1 schedule).

  • Responding occurs at a steady and moderate rate with brief, unpredictable pauses.

  • Example: Every time you drink tequila, you are punished with a hangover.

Ratio vs. Interval Schedules

  • Similarities:

    • Post-reinforcement pause after both FR and FI schedules.

    • High rate of responding just before reinforcement in both FR and FI schedules.

    • Steady rate of responding without pauses in both VR and VI schedules.

  • Differences:

    • Reynolds (1975) found higher pecking rates in pigeons on VR schedules compared to VI schedules when overall reinforcement was matched.

    • Ratio schedules reinforce short inter-response times, while interval schedules reinforce initially long inter-response times (“molecular theory”).

    • Ratio schedules have a direct linear relationship between responses and reinforcement, while interval schedules have an upper limit on reinforcement (“molar theory”).

  • Limited Hold: A restriction on the availability of reinforcement; the reinforcement is only available for a limited period.

Concurrent Schedules (dont need formula)

  • Making one response prevents making another (mutually exclusive choices).

  • Organisms choose based on the relative rate and value of reinforcement for each option.

  • In concurrent schedule procedure, subjects has two choices, each reinforced on a different reinforcement schedule that enables continuous measurement of choice with free switching between response alternatives.

Measuring Choice

  • Relative Rate of Responding:

    • B_L = behavior on left key

    • B_R = behavior on right key BL/(BL+BR)

    • \frac{BL}{BL + B_R} calculates the proportion of responses on the left key.

  • Relative Rate of Reinforcement:

    • r_L = rate of reinforcement on left key

    • r_R = rate of reinforcement on right key rL/(rL+rR)

    • \frac{rL}{rL + r_R}

  • If rates of reinforcement are equal, the relative rate of responding will match.

Matching Law

  • Herrnstein's investigation showed relative rates of responding match relative rates of reinforcement.

  • The relative rate of responding on an alternative matched the relative rate of reinforcement on that alternative.

  • Emphasizes reinforcement relative to other alternatives, not just the response alone.

  • Example: Teenagers from low SES households may engage in risky behaviors due to higher reinforcement rates compared to enriching upbringings.

  • Example: Compulsive eating provides immediate reinforcement (CRF) compared to the variable rate of reinforcement for dieting (VI).

Generalized Matching Law

  • Includes parameters for sensitivity (s) and bias (b) to account for mismatches between responding and reinforcement rates.

  • s = sensitivity: Reduced sensitivity (s < 1) leads to undermatching.

  • b = response bias or preference (e.g., handedness).

Concurrent-Chain Schedule

  • Choice with commitment to a particular reinforcement schedule.

  • Preference: Ratio schedules OVER interval schedules, and variable schedules OVER fixed schedules.

Self-Control

  • Involves committing to a behavior with short-term losses but long-term benefits (e.g., exercise, studying).

Delay Discounting

  • The value of a reinforcer decreases with waiting time.

  • Self-control is easier when the tempting alternative is not readily available.

Value Discounting (formula not needed)

  • The value of a reinforcer is reduced by how long you have to wait to get it.

  • Value-discounting function:

    • V = \frac{M}{1 + KD}

    • V = value of a reinforcer

    • M = reward magnitude

    • D = reward delay (no delay = 0)

    • K = discounting rate parameter

  • Reward value initially decreases rapidly. Steeper discounting functions indicate more impulsive behavior.

  • Madden, Petry, Badger, & Bickel (1997) showed heroin addicts have steeper discounting functions than controls.

  • Self-control can be trained by shaping (gradually increasing delay), using low-effort tasks, or distraction.

Motivation to Respond: Two Approaches

  • Associative structure of instrumental conditioning (Thorndike).

  • Response-allocation approach (Skinnerian tradition).

Associative Structure of Instrumental Conditioning

  • Three-term contingency:

    • Contextual stimulus (S)

    • Instrumental response (R)

    • Reinforcing or response outcome (O)

  • S-R association (law of effect): The reinforcer strengthens the S-R association.

    • Motivation to respond is activation of the S-R association via exposure to the stimulus.

    • Resurgence of interest in S-R association due to habits and automatized behaviors.

  • S-O association: Learning outcomes associated with signals (Pavlovian conditioning).

    • Hull (1930) and Spence (1956) proposed that the instrumental response increases because the S evokes the response directly (S-R) and creates an expectancy of reward (S-O).

  • Two-Process Theory:

    • Pavlovian and instrumental learning.

    • During conditioning, the stimulus becomes associated with the outcome (S-O association) which activates an emotional state that motivates behavior.

  • Pavlovian Instrumental Transfer (PIT) Experiment:

    • Test if a Pavlovian S-O association motivates instrumental behavior.

    • Example: Lever pressing for food decreases in the presence of a CS+ for footshock.

    • Krank et al., 2008 showed lever pressing for alcohol increased when a light CS was presented.

Expectancy for Specific Rewards

  • Kruse et al., (1983) showed CS+ for food pellets facilitated instrumental responding reinforced with pellets more than sugar water, and vice versa.

  • Demonstrates expectancies for specific rewards, not just conditioned emotional states.

Direct R-O Associations

  • Devaluation studies provide evidence.

  • If an instrumental response is motivated by an R-O association, devaluation of the reinforcer should reduce the rate of the instrumental response.

  • Hogarth & Chase, 2011 showed evidence in human behavior.

S(R-O) Association

  • S activates both R and the R-O association, motivating behavior.

Response Allocation

  • Based in Skinnerian tradition and focused on functional aspect of behavior rather than internal associative processes.

Consummatory-Response Theory

  • Consummatory responses (eating, drinking) are reinforcing because they involve an instinctive behavior sequence.

  • Instrumental consummatory responses are fundamentally different from other types of instrumental responses.

Premack Principle (IMPORTANT)

  • The opportunity to perform a higher probability response (H) after a lower probability response (L) will reinforce response L.

  • Any high-probability activity can be an effective reinforcer.

  • Can explain individual differences in reinforcing properties of activities and can be applied in clinical situations.

Response-Deprivation Hypothesis

  • Restriction of activity is what makes responses reinforcing.

  • Low probability responses can be effective reinforcers if restricted.

  • Instrumental conditioning procedures inherently involve some degree of restriction to the reinforcer which relates to outcome devaluation.

Response Allocation Approach

  • Examines all available response options and how the distribution of responses changes when an instrumental conditioning procedure is introduced.

  • Uses an unconstrained baseline (behavioral bliss point), which is how the individual allocates responses without restrictions.

  • Assesses deviation from the unconstrained baseline when restrictions are imposed so subject compromise and redistribute responses to bring them as close as possible to the preferred level of distribution.

  • Reinforcer effect: An increase in the instrumental response above the level in the absence of the response-reinforcer contingency.

Behavioral Economics

  • Bliss point approach is similar to economics research, which studies how people change behavior to maximize benefits and minimize costs.

  • Restrictions are imposed by income and price of goods in economics, and by the number of responses and required to obtain each reinforcer in instrumental conditioning.

  • Psychologists study behavior regulation in terms of economic concepts, such as supply and demand.

Elasticity of Demand

  • The degree to which price influences consumption.

  • Factors:

    • Availability of substitutes

    • Price range

    • Income level

    • Link to complementary commodity

Summary

  • Intermittent reinforcement is administered according to various schedules based on responses or time, either fixed or variable.

  • Use competing alternatives to assess preferences, especially in choice with commitment procedures.

  • Preference for rewards decreases with delay.

  • Motivation can be explained by S-O and S-R associations.

  • Behavior can also be explained based on what reinforces the behavior and available alternatives.