chapter5 lecture

Instrumental Conditioning

Definition and Overview

Instrumental Conditioning involves teaching animals to associate a stimulus with a response, as opposed to classical conditioning where two stimuli (Conditioned Stimulus - CS and Unconditioned Stimulus - US) are associated.
Key concept in this field rooted in studies of animal intelligence conducted by Edward Thorndike.

Key Figures and Initial Experiments

Edward Thorndike: Trained cats using maze boxes of increasing complexity, observing that:
- Initially made varied responses.
- Eventually made the correct response that led to escape (reinforcement).

Learning and the Law of Effect

Escape Latency: The time it took for the cats to escape decreased, indicating learning of an S-R association (Stimulus-Response).
Thorndike's belief: Each successful escape strengthened the association between box cues (S) and responses (R).
Law of Effect: Responses that are followed by reinforcing events will be strengthened, while those followed by non-reinforcing or annoying events will be weakened.

Procedures in Instrumental Conditioning

Discrete-Trial Procedures

Thorndike's approach: The trial begins when the cat is placed in a box and ends when it escapes.
Each trial allows one instrumental response, and the experimenter controls the timing of the next opportunity to respond.

Free-Operant Procedures

B.F. Skinner: Developed the free-operant method, which allows animals to repeat a response without constraint, leading to more continuous study of instrumental conditioning.
- Defined an operant as a measurable unit of behavior, based on its environmental effects.
- Example: A rat's lever press, whether using left paw, right paw, tail, or butt, is treated as the same operant response.

Changes in Response Probability

Response Rate: Central measure of learning in free-operant procedures; as likelihood of a response increases, frequency of that response rises (i.e., the time between responses decreases).

Shaping and Behavioral Training

Behavioral Shaping: Requires reinforcing successive approximations towards a desired response.
- Initial training step often involves magazine training: the animal learns to associate the sound of food delivery with food itself, establishing a sign-tracking response.

Key Elements of Instrumental Conditioning

Response
Outcome (Reinforcement)
Response-Outcome Contingency

Appetitive Stimulus: A stimulus that an animal will work to receive (e.g., food, water).
Aversive Stimulus: A stimulus that an animal will work to avoid (e.g., shock).
The likelihood of an instrumental response is influenced by the ability of the response to produce or prevent an outcome, and whether that outcome is appetitive or aversive.

Differential Reinforcement and Omission Training

Omission training refers to Differential Reinforcement of Other Behavior (DRO), where other behaviors are reinforced instead of the undesired one, often used in treating self-injurious behaviors.

Stereotypy in Instrumental Conditioning

With training, operant responses become more stereotyped or fixed. This suggests the establishment of S-R connections.
Not an Inevitable Result: Page and Neuringer (1985) demonstrated that animals can learn to perform more variable behaviors by making variability an explicit task requirement.
Example: Pigeons learned to make patterns of responses that avoided repetition from previous trials.

Response Variability in Humans

Human participants in studies showed that increased variability in behavior (e.g., varying rectangles drawn) could be encouraged through reinforcement.
Groups involved:
- Vary Group: Received points for variability.
- Yoked Group: Received points contingent on the actions of the partner in the Vary group.

Belongingness in Instrumental Conditioning

Belongingness Concept: Some S-R associations are more easily formed based on evolutionary history.
Study by Sevenster (1973): Found that under different reinforcement conditions, behavior (biting a rod) increased or did not increase depending on the nature of the reinforcer (another male vs. access to a female).

Behavioral Constraints in Training

Instinctive Drift: The phenomenon where animals revert to instinctual behaviors, making the desired trained responses difficult.
Example: Breland and Breland's experiments at amusement parks, where innate behaviors interfered with trained behaviors.

Instrumental Reinforcement Sensitivity

Response Sensitivity: Instrumental responding is influenced by the quality and quantity of reinforcements.
- Case Study: A 5-year-old autistic boy pressing a button under different reward contingencies.

Shifts in Reinforcer Quality or Quantity

Contrast Effects: Mellgren (1972) demonstrated effects of changing reward expectations on performance using a phase design involving small and large food rewards leading to positive and negative contrast effects, respectively.

Temporal Relation Impact on Conditioning

The Temporal Relation denotes time between a response and resulting reinforcement; immediate reinforcement leads to better learning.
When delays occur, it complicates the subject's ability to associate a specific response with its reward.

Overcoming Delays in Instrumental Conditioning

Secondary Reinforcers: Conditioned stimuli associated with a primary reinforcer help bridge response-reinforcement gaps, maintaining response learning over delays.
Marking Procedure: A method where responses can be 'marked' to distinguish them from others, facilitating learning.

Response-Reinforcer Contingency

Refers to the causal relationship between a response and reinforcement, crucial for understanding instrumental learning.
Studies show perfect causal relations alone are insufficient for effective learning.

Skinner's Superstitious Experiment

Conducted in 1948, pigeons were given food every 15 seconds, regardless of behavior, leading them to develop superstitious behaviors based on temporal patterns rather than causal ones.
Adventitious Reinforcement: Skinner's term for unintentional behavioral reinforcement due to coincidences in timing.

Analyzing Superstitious Behavior

Staddon and Simmelhag (1971): Analyzed superstitious behaviors categorizing them into terminal and interim responses, with the discovery of behavioral patterns reflecting anticipation.

Behavioral Systems Theory Reinterpretation

Proposes all responses relate to a feeding system activated by periodic food delivery, categorizing behaviors across intervals as post-food focal searches, general searches, and focal searches.

Effects of Controllability of Reinforcers

Instrumental behavior can suffer when animals perceive a lack of control over reinforcements, leading to a Learned Helplessness Effect, typically studied using a triadic design.

Effects of Learned Helplessness

The learned helplessness hypothesis suggests prior exposure to uncontrollable reinforcement leads to decreased motivation and difficulty in learning new behaviors under different conditions.