Animal Learning – Reinforcement, Schedules, and Behaviour-Change Techniques

Learning Objectives

Introduce core learning-theory terminology and concepts for newcomers.
Encourage experienced learners to re-examine familiar material from fresh angles (e.g.
considering ethology and emotion in addition to classic Pavlovian or Skinnerian views).
Equip you to:
- Explain why animals alter behaviour after experience.
- Plan scientifically sound training programmes (hands-on methodology is addressed later and in assessments).

Principles of Reinforcement

Reinforcement ≈ a change in the animal’s affective (emotional) state that makes the behaviour it follows more likely.
Punishment ≈ a shift in affect that makes the preceding behaviour less likely.
Key vocabulary:
- Primary (unconditioned) reinforcer – innately valuable (e.g. food, water, warmth, social contact, pain relief).
- Secondary (conditioned) reinforcer – gains value through pairing with a primary one (e.g. clicker sound, “good dog”).
Ethical & welfare angle:
- Emphasise positive reinforcement to enhance welfare; minimise aversives.
- Acknowledge that any procedure that manipulates affect has moral implications.

Reinforcement Schedules

Schedules = rules describing when each behavioural response will be reinforced.

Continuous Reinforcement (CRF)

Every occurrence of the target behaviour earns a reinforcer.
Advantages:
- Rapid acquisition.
- Clear information about contingency.
Disadvantages:
- $\text{Rapid extinction}$ once reinforcement stops (animal notices immediately).

Partial / Intermittent Reinforcement

Only some responses are reinforced.
General pattern:
- $\text{Slower acquisition}$ compared with CRF.
- Far greater resistance to extinction; animal keeps “checking” if payoff will reappear.
Four canonical sub-types (Skinner, 1938):
- Fixed Ratio (FR)
- Reinforce after a set number of responses, e.g. FR-5 ⇒ every 5th response.
- Generates high, steady rates; brief post-reinforcement pause.
- Performance often described by a “stair-step” cumulative record.
- Variable Ratio (VR)
- Reinforce after an unpredictable number; centred on a mean, e.g. VR-20.
- Produces the highest response rates; very hard to extinguish (gambles, slot-machines).
- Fixed Interval (FI)
- First response after a fixed time interval earns reinforcement, e.g. FI-30 s.
- Typical scalloped pattern: slow just after reward, accelerating as interval ends.
- Variable Interval (VI)
- First response after a variable, unpredictable interval; mean value defined.
- Produces slow, steady responding; common in natural foraging.

Differential Reinforcement (DR)

Values of rewards scale with quality of performance.
Example protocol:
- Below-average response → no reward.
- Average → low-value treat.
- Above-average → medium treat.
- Exceptional → high-value treat (e.g. steak cube).
Enhances precision and encourages continual improvement without aversives.

Shaping, Capturing & Luring

Shaping = reinforcing successive approximations until the full behaviour emerges.
- More humane than force or flooding; leverages animal’s natural variability.
- Tips for effectiveness:
- Define the final objective behaviour precisely.
- Break into small, achievable criteria; raise the criterion after 2–3 consecutive successes.
- Keep rate of reinforcement high; avoid frustration.
- If the animal stalls, back up to last successful step ("gradient of difficulty").
Capturing = waiting for the animal to spontaneously perform the behaviour, then marking & reinforcing.
Luring = using a visible reinforcer (or target) to guide the movement; fade lure promptly to prevent dependency.
Chinning (mentioned) often refers to gently guiding via manual contact (commonly in farm species training).
Extinction = withholding reinforcement so behaviour frequency diminishes; may provoke an extinction burst (temporary spike in intensity or variability).

Chaining

A chain = ordered sequence where each behaviour’s cue is the completion of the previous behaviour.
Two assembly strategies:
- Forward Chaining
- Teach behaviour A → reinforce.
- Add behaviour B after A → reinforce, etc.
- Natural for sequences that mirror real-world order.
- Backward Chaining
- Start with last behaviour (closest to primary reinforcer) — ensures every trial ends with success.
- Add preceding links progressively (Z → reinforce; Y then Z → reinforce; X then Y then Z…).
- Favoured in complex routines (e.g. guide-dog opening fridge → bring soda → sit politely).
Practical advice:
- Task analysis – break the routine into clear, trainable components.
- Decide whether to reinforce only after the full chain or maintain occasional within-chain rewards.
Widely utilised in advanced contexts: assistance-dog tasks, marine-mammal shows, equestrian liberty acts.

Discrimination & Generalisation

Discrimination = responding differently to stimuli that vary along a critical dimension.
- Example: dog sits on “sit” cue but not on “down”.
- Rooted in adaptive ecology – recognising food vs. toxin, friend vs. foe.
Generalisation = treating diverse stimuli with shared features as functionally similar.
- Example: horse accepts various trailer types after trailer-loading training.
- Promotes efficiency: no need to relearn every novel instance.
Trainers shape the stimulus control spectrum via deliberate exposure, differential reinforcement, and context variation.

Factors Affecting Learning

Motivational State
- Hunger, thirst, social drive, pain relief, play motivation.
- Under- or over-motivation leads to sub-optimal performance.
Biological Predispositions (“preparedness”)
- Species-specific tendencies facilitate or hinder certain associations (e.g. rats readily link taste with nausea but not with electric shock location).
Environment
- Distractors (noise, smells, movement) may compete for attention.
- Anxiety / stress impairs cognitive processing; aim for $\text{Optimal Arousal Zone}$ (Yerkes–Dodson curve).
Age
- Example: foals handled early show reduced neophobia later.
- Senescent dogs often exhibit declines in working memory and learning speed; adjust session length & complexity.

Connections to Classical Conditioning & Emotion

Operant contingencies frequently overlap with classical (Pavlovian) conditioning.
- A clicker becomes a secondary reinforcer through classical pairing with food.
Emotional states (fear, relief, anticipation) function as motivating operations; they enhance or diminish reinforcing power.
Integrating ethological context prevents contra-free loading (animals evolving to "work" for outcomes) conflicts.

Practical & Ethical Implications

Favour least-intrusive, minimally aversive (LIMA) principle.
Partial schedules & differential reinforcement reduce reliance on punishment while preserving precision.
Recognise and respect agency: allow the animal choice & control wherever feasible.

Numerical / Statistical References

Notation recap:
- $\text{FR-}n$ = reinforce each $n^{\text{th}}$ response.
- $\text{VR-}n$ = mean of $n$ responses, variance unspecified.
- $\text{FI-}t$ = first response after $t$ seconds.
- $\text{VI-}t$ = variable interval averaging $t$ seconds.

Study Hints

Diagram each schedule’s cumulative-response curve to visualise differences.
Practise constructing shaping plans: list at least 6–8 approximation steps.
Observe real-life examples (e.g. slot machines = VR, checking phone notifications = VI).

Summary

Reinforcement alters probability of operant behaviour; proper timing & contingency strength are critical.
Shaping + thoughtfully chosen schedules = humane, efficient training.
Always integrate motivation, sensory biases, and emotional wellbeing for successful learning and ethical practice.