Chapter 9: Theories of Reinforcement

PSYC 351 - Fundamentals of Learning

Chapter 9: Theories of Reinforcement

Outline of Key Points

What Makes Something a Reinforcer?
- Original definitions
- Premack Principle
- Response Deprivation
Behavioural Economics
Instrumental Conditioning & the Brain
- Dopamine & motivation

Recall: Contingencies

Instrumental Conditioning (IC) is fundamentally about contingencies defined as
- If R (Response), then O (Outcome)
Contingencies may change:
- Example: If I cry, then mom picks me up.
Discriminative Stimulus (S):
- S indicates which contingencies are currently in effect.
- The relationship can be detailed as:
  - If S occurs, then R leads to O
  - If S does not occur, then R does not produce O

Three-Part Association in Instrumental Conditioning

IC consists of an association defined as:
- Context/Discriminative Stimulus (S)
- Behavioural Response (R)
- Outcome (O)
Importance of Discriminative Stimuli:
- They guide the selection of appropriate behaviors relevant to the situation.
- Example: A person will insert money into a vending machine only if it’s operational.
- Example: Rats learn that they must press a lever only when they hear the equipment signal indicating it is functioning.

Theories of Reinforcement

Effective theories in this context must meet certain criteria:
- Consistency with research findings.
- Direct subsequent research to refine and increase the precision of the theory.
- Provide novel insights and perspectives concerning phenomena.

Critical Questions Addressed by Theories of Reinforcement

What makes something a reinforcer?
- How can predictions be made regarding effective reinforcement?
How does this reinforcer produce its effects?
- What specific mechanisms increase the probability of the reinforced behaviour?

What Makes Something a Reinforcer?

A stimulus that produces a satisfying state of affairs (defined by Thorndike).
A stimulus or outcome that increases the response that caused the stimulus to become accessible (defined by Skinner).
- These definitions lay foundational insights but do not construct a comprehensive theory of reinforcement. They merely illustrate a relationship between behaviour and consequence without predictive capacity across contexts.

Hull & Drive Reduction Theory

Clark Hull, accepting the Law of Effect and the S-R mechanisms, aimed to understand the efficacy of reinforcers.
Homeostasis:
- Defined as the biological drive to maintain stable critical bodily functions like temperature, blood sugar levels, and water balance.
A Drive State emerges when an organism experiences an imbalance, producing motivation to restore homeostasis (e.g., hunger leads an animal to seek food).
- Drive reduction theory posits that an effective reinforcer will reduce the drive state.

Primary Reinforcers

Deprivation Procedures:
- Procedures that disturb biological homeostasis create drive states where stimuli that reduce these drives serve as effective reinforcers.
Primary Reinforcers:
- These are inherently effective at reducing biological drives without the necessity for prior training, e.g., food, water, and social/reproductive behaviors.
- Many stimuli may act as reinforcers even without satisfying a biological need; for instance, money can be reinforcing despite having no inherent biological value but serves as a means to acquire primary reinforcers.

Secondary Reinforcers & Acquired Drives

Hull’s theory extends to stimuli conditioned through Pavlovian association:
- For example, the aroma of food linked with hunger reduction becomes a secondary reinforcer.
- A Secondary Reinforcer or Conditioned Reinforcer is associated with a primary reinforcer.
- Conditioned Drives:
- These occur when stimuli evoke drive states secondary to their association with primary reinforcers (e.g., feeling hungry upon seeing dessert).

Sensory Reinforcement

Not all reinforcement instances can be explained by Hull’s drive reduction theory.
- Sensory Reinforcement:
- This is described as reinforcement stemming from stimuli that are not tied to biological needs, e.g., listening to music, viewing a playful film, engaging in a rollercoaster ride.
- The accumulation of evidence advocating for sensory reinforcement has led to a reevaluation of Hull's theory.

Revisiting Definitions of Reinforcers

Species-Specific Consummatory Response Theory (Sheffield):
- Reinforcers reflect species-specific consummatory behaviours like eating or drinking that complete instinctive behavioral sequences, differing from non-consummatory instrumental responses.
- Evidence indicates that certain stimuli (e.g., saccharin) can act as reinforcers despite lacking nutritional value.
Reinforcers as High Probability Responses:
- The Premack Principle advocates that the differential probability between responses influences reinforcement effectiveness.
- A more preferred behavior reinforces a less preferred behavior (e.g., eating reinforces lever-pressing).
- The defining relationship: If L leads to H (low probability response leads to a high probability response), then H reinforces L, but not vice versa.

The Premack Principle

Premack (1965) Experiment:
- Phase 1 involved a preference test for children between candy and pinball.
- Phase 2 involved two groups where either candy was used to access pinball or vice versa.
- Results:
- Group 1: Children who preferred pinball ate more candy, whereas those who preferred candy ate less but played less pinball.
- Group 2: Children preferring candy played more pinball than those who preferred pinball, highlighting how the high probability behavior can reinforce the low probability one.

Applications of the Premack Principle

In Educational Settings:
- Strategies are developed to increase the likelihood of less frequently performed responses by leveraging highly frequent behavior as a reward.
- Example: Completing assignments can lead to free time or outdoor play.

Challenges in Applying the Premack Principle

Measurement issues arise in assigning numerical probabilities to response likelihoods.
- Alternative strategies, such as token economies, can mitigate measurement challenges by rewarding target behaviors with tokens redeemable for preferred activities.

Response-Deprivation Hypothesis

Timberlake & Allison critiqued the Premack principle, positing that restriction on behaviors suffices for reinforcement.
- Response-Deprivation Hypothesis:
- Each behavior possesses a preferred level; restricting access to that behavior prompts the performance of another behavior to regain access.
- The reinforcer emerges from the instrumental contingency, with predictions indicating that even low probability responses can serve as reinforcers when restricted.

Application of the Response-Deprivation Hypothesis

Identify two low probability behaviors (X and Y).
Restrict access to one behavior (X) below baseline levels.
Make access to that behavior (X) contingent upon performance of the second behavior (Y).
- Evidence suggests that restricted behaviors can act as reinforcers based on studies focusing on children with mental disabilities, providing more robust validation for this hypothesis than for the Premack principle.

Locus of Reinforcement Effects

Other theories explain reinforcement through external factors to the IC procedure.
- In drive reduction theory, factors instigate the drive state; in the Premack principle, differential baseline probabilities predominate.
- The response deprivation hypothesis focuses on how procedural constraints shape reinforcement.
This paradigm introduces questions regarding the mechanisms by which reinforcers elevate responding probabilities, remaining unresolved within this framework.

Behavioural Regulation Approach

Molar theories advance understanding of how organisms leverage environmental interactions to meet objectives.
- Attention shifts to how IC procedures limit activities and redistribute these towards goal achievement.
Theories are aimed at understanding:
- ‘How instrumental conditioning procedures limit an organism’s activities and cause redistributions of those activities’ (Domjan)
Adopts a more global, or molar perspective. Rather than ‘how often will you press the lever in this moment because you’re hungry’, it tracks all of your decisions across time. This takes into consideration that time is a resource, there is a finite amount of it; decisions to behave a particular way mean that you’re not only choosing to behaviour that way, but it is also a decision against every other behaviour

Behavioural Homeostasis and Bliss Point

It’s theorized that behavioral regulations function on homeostatic models.
- The Behavioural Bliss Point defines each organism's preferred distribution of activities, which it strives to maintain against disturbances.
- This point is characterized by the minimal deviation point in activity preference.
- Example: An individual's allocation of study time vs. leisure, typical desire to navigate closer to the bliss point without exacerbating responsiveness in less preferred activities.
There is an ideal amount of behaviour for every behaviour — for instance, there’s an ideal amount of time you’d wanna dedicate to watching TV, exercising, and so on. If you change contingencies, then you’ll change the frequency of certain behaviours (e.g, ideal distribution of possible activities).
- New contingency: you can only watch as much TV as you spend time studying… this new contingency introduction will impact behavioural distribution
  - For instance, you might decide that you hate studying and will study no more than 15 minutes a day, which leaves you with more time to do something else (like reading or exercising).
  - What typically happens in the minimum deviation point (the happy middle), the ideal deviation point that minimizes change to activity)
Reinforcement effect:
- Increase in occurrence of instrumental response above the level of the behaviour in the absence of the response-reinforcer contingency
- Example: study time increases more than it would occur normally, as a result of making TV-watching contingent upon studying

Response Allocation & Behavioural Economics

Response allocation is the distribution of responses among various options available in a given situation
- Decreasing access creates a redistribution of responses so that the reinforced response occurs more often
What causes this change and what rules govern these changes?
- Helped establish the field of behavioural economics
Economics deals with the allocation of resources among various options
- Money is a major resource for people
In instrumental conditioning, the resource is behaviour that can be allocated among various options

Behavioural Economics

Value as a function of cost
The study of how organisms distribute their time and effort among possible behaviours
- Commodity = reinforcer
- Price = time or total number of responses (effort) required to obtain reinforcer
Inelastic curve — gas (consumption remains same even if price wavers/increases) and cigarettes (price increases but amount smoked remains the same)
Elastic curve —
Example:
- You have a hundred dollars, you spend it on dinners out (20 bucks each) and albums (10 bucks each) - bliss point is 5 dinners and 6 albums
- You have a hundred dollars, you spend it on dinners out (50 bucks each) and albums (10 bucks each) - new bliss point is 2 dinners and 5 albums
- You buy less dinners because of the price increase

Behavioural Economics In The Lab

Escalating (e.g., effort output for reinforcer increases after receiving it) fixed ratio schedule
Reinforcers —

Using BE to Compare Reinforcers

The availability and price of alternative greatly influences the elasticity of demand
- Substitutes
  - Coke price increases, there is a decrease in Coke consumption and an increase in consumption of the alternative Pepsi
- Independents
  - Coke price increases, there is a decrease in Coke consumption and no impact on cream cheese sales
- Complements
  - Coke price increases, there is a decrease in Coke consumption and a decrease in consumption of the complement rum
  - Great maple syrup heist, syrup price increases, pancake consumption decreases

Foltin (1999) - cocaine baboon study

Had baboons, gave them drugs, and food deprived them
- Escalating fixed ratio schedule (food consumed become pricier)
Responses on second lever produce either nothing, sugar, or different concentrations of cocaine
- Group 1 — left lever is food, right lever does nothing: food consumption doesn’t change even with escalating price (no alternative)
- Group 2 — left lever is food, right lever is dextrose (artificial sweetener with no nutritional benefits): food consumption doesn’t change even with escalating price (bad alternative)
- Group 3 (3 diff. concentrations of cocaine) — left lever is food, right lever is cocaine: they are not willing to work for the food as much, especially when foods get high… they take the more easily accessible cocaine route, appetite suppressant (escape of environment of needing to work so hard for food)

How do we Learn IC (Brain Stuff)?

Neuroeconomics: brain designed to maximize reinforcement (profit) while minimizing effort (cost)
Dopamine in the mesocorticolimbic pathway is important for motivation (what gets you off your butt

“Wanting” and “Liking” in the Brain

We have brain systems for signalling hedonic value
- Meaning the subective ‘goodness’ of a reinforcer or how much we ‘like’ it
  - Endogenous opioids signal the hedonic value (‘liking’) of reinforcers
These are distinct from those signalling motivational value
- Meaning how much we ‘want’ a reinforcer and how hard we are willing to work to obtain it
  - Incentive salience hypothesis: DA motivates learners to work for reinforcement

Incentive Salience Hypothesis

Rat prefers sugar pellets over rat chow if both kinds of food are freely available
- They will work for sugar pellets with dopamine
- When given dopamine antagonist, they are no longer willing to work for the sugar pellets
- Basically separates liking and wanting (e.g., they still like the pellets but no longer want them, or not willing to put in the effort to work for it)

Mesocorticolimbic DA & Motivation

One of the “pleasure centers'“ is the ventral tegmental area (VTA) in the brainstem
- The VTA is the center for DA neuromodulation
- VTA stimulation = powerful reinforcer
Here are the effects of drug addiction;
- Flood of dopamine
- Diminishes the impact of the prefrontal cortex (e.g., diminishes inhibition)
DA implicated in addicition:
- A disorder of motivation
- All drugs of abuse increase DA in the striatum and nucleus accumbens
  - Wise, 2001
- Study with monkey: DA activity in the striatum when reinforcement given
  - Hollerman & Schultz, 1998
  - Surgery to implant electode in VTA of monkey
    - Origin of DA pathway (mesocorticolimbic)
  - Measure electrical activity of DA neurons as monkey does a task
    - Look at how DA responses change with many learning trials
  - Results:
    - DA released in response to reinforcers in environment
      - Training shifts DA signal in response to absence of reward
      - See decrease in DA signal in response to absence of reward
    - DA predicts availability of reinforcer and instigates actions to acquire it
There is learning, and learning causes dopamine release — the dopamine is released to obtain a reward. Shultz believed that dopamine is learning, but teacher believes it is motivation to get reward.

Measuring DA in Humans

Functional Magnetic Resonance Imaging (fMRI)
- Can visualize blood flow in the brain
  - Infer that increased blood flow to a specific area is associated with increased activity
- Do not measure increased DA; but increased activity in these dopaminergic brain areas
  - Activity doesn’t necessarily imply a change, it could be the cause of a change
  - Just because there is increased activity, doesn’t mean the increased activity is doing something.

Nigrostriatal DA & Motivation

The substantia nigra pars compacta (SNc) is a part of the BG that contains DA-producing neurons that project to the striatum
DA is also a movement neurotransmitter — and it is very present here.

Other Areas Involved in IC

IC requires integrated activity in many brain areas
Two brain areas in particular serve distinct roles in overall conditioning:
- The dorsal striatum seems to play a role in linking the stimulus with the response (S-R part)
- The orbitofrontal cortex seems to play a role in linking response with outcomes (R-O part)

Dorsal Striatum: S-R Learning

Lesion in dorsal striatum leads to issues in S-R learning; they learn what response to do but can’t link it with context to perform it in