Animal Learning – Reinforcement, Schedules, and Behaviour-Change Techniques
Learning Objectives
Introduce core learning-theory terminology and concepts for newcomers.
Encourage experienced learners to re-examine familiar material from fresh angles (e.g.
considering ethology and emotion in addition to classic Pavlovian or Skinnerian views).Equip you to:
Explain why animals alter behaviour after experience.
Plan scientifically sound training programmes (hands-on methodology is addressed later and in assessments).
Principles of Reinforcement
Reinforcement ≈ a change in the animal’s affective (emotional) state that makes the behaviour it follows more likely.
Punishment ≈ a shift in affect that makes the preceding behaviour less likely.
Key vocabulary:
Primary (unconditioned) reinforcer – innately valuable (e.g. food, water, warmth, social contact, pain relief).
Secondary (conditioned) reinforcer – gains value through pairing with a primary one (e.g. clicker sound, “good dog”).
Ethical & welfare angle:
Emphasise positive reinforcement to enhance welfare; minimise aversives.
Acknowledge that any procedure that manipulates affect has moral implications.
Reinforcement Schedules
Schedules = rules describing when each behavioural response will be reinforced.
Continuous Reinforcement (CRF)
Every occurrence of the target behaviour earns a reinforcer.
Advantages:
Rapid acquisition.
Clear information about contingency.
Disadvantages:
once reinforcement stops (animal notices immediately).
Partial / Intermittent Reinforcement
Only some responses are reinforced.
General pattern:
compared with CRF.
Far greater resistance to extinction; animal keeps “checking” if payoff will reappear.
Four canonical sub-types (Skinner, 1938):
Fixed Ratio (FR)
Reinforce after a set number of responses, e.g. FR-5 ⇒ every 5th response.
Generates high, steady rates; brief post-reinforcement pause.
Performance often described by a “stair-step” cumulative record.
Variable Ratio (VR)
Reinforce after an unpredictable number; centred on a mean, e.g. VR-20.
Produces the highest response rates; very hard to extinguish (gambles, slot-machines).
Fixed Interval (FI)
First response after a fixed time interval earns reinforcement, e.g. FI-30 s.
Typical scalloped pattern: slow just after reward, accelerating as interval ends.
Variable Interval (VI)
First response after a variable, unpredictable interval; mean value defined.
Produces slow, steady responding; common in natural foraging.
Differential Reinforcement (DR)
Values of rewards scale with quality of performance.
Example protocol:
Below-average response → no reward.
Average → low-value treat.
Above-average → medium treat.
Exceptional → high-value treat (e.g. steak cube).
Enhances precision and encourages continual improvement without aversives.
Shaping, Capturing & Luring
Shaping = reinforcing successive approximations until the full behaviour emerges.
More humane than force or flooding; leverages animal’s natural variability.
Tips for effectiveness:
Define the final objective behaviour precisely.
Break into small, achievable criteria; raise the criterion after 2–3 consecutive successes.
Keep rate of reinforcement high; avoid frustration.
If the animal stalls, back up to last successful step ("gradient of difficulty").
Capturing = waiting for the animal to spontaneously perform the behaviour, then marking & reinforcing.
Luring = using a visible reinforcer (or target) to guide the movement; fade lure promptly to prevent dependency.
Chinning (mentioned) often refers to gently guiding via manual contact (commonly in farm species training).
Extinction = withholding reinforcement so behaviour frequency diminishes; may provoke an extinction burst (temporary spike in intensity or variability).
Chaining
A chain = ordered sequence where each behaviour’s cue is the completion of the previous behaviour.
Two assembly strategies:
Forward Chaining
Teach behaviour A → reinforce.
Add behaviour B after A → reinforce, etc.
Natural for sequences that mirror real-world order.
Backward Chaining
Start with last behaviour (closest to primary reinforcer) — ensures every trial ends with success.
Add preceding links progressively (Z → reinforce; Y then Z → reinforce; X then Y then Z…).
Favoured in complex routines (e.g. guide-dog opening fridge → bring soda → sit politely).
Practical advice:
Task analysis – break the routine into clear, trainable components.
Decide whether to reinforce only after the full chain or maintain occasional within-chain rewards.
Widely utilised in advanced contexts: assistance-dog tasks, marine-mammal shows, equestrian liberty acts.
Discrimination & Generalisation
Discrimination = responding differently to stimuli that vary along a critical dimension.
Example: dog sits on “sit” cue but not on “down”.
Rooted in adaptive ecology – recognising food vs. toxin, friend vs. foe.
Generalisation = treating diverse stimuli with shared features as functionally similar.
Example: horse accepts various trailer types after trailer-loading training.
Promotes efficiency: no need to relearn every novel instance.
Trainers shape the stimulus control spectrum via deliberate exposure, differential reinforcement, and context variation.
Factors Affecting Learning
Motivational State
Hunger, thirst, social drive, pain relief, play motivation.
Under- or over-motivation leads to sub-optimal performance.
Biological Predispositions (“preparedness”)
Species-specific tendencies facilitate or hinder certain associations (e.g. rats readily link taste with nausea but not with electric shock location).
Environment
Distractors (noise, smells, movement) may compete for attention.
Anxiety / stress impairs cognitive processing; aim for (Yerkes–Dodson curve).
Age
Example: foals handled early show reduced neophobia later.
Senescent dogs often exhibit declines in working memory and learning speed; adjust session length & complexity.
Connections to Classical Conditioning & Emotion
Operant contingencies frequently overlap with classical (Pavlovian) conditioning.
A clicker becomes a secondary reinforcer through classical pairing with food.
Emotional states (fear, relief, anticipation) function as motivating operations; they enhance or diminish reinforcing power.
Integrating ethological context prevents contra-free loading (animals evolving to "work" for outcomes) conflicts.
Practical & Ethical Implications
Favour least-intrusive, minimally aversive (LIMA) principle.
Partial schedules & differential reinforcement reduce reliance on punishment while preserving precision.
Recognise and respect agency: allow the animal choice & control wherever feasible.
Numerical / Statistical References
Notation recap:
= reinforce each response.
= mean of responses, variance unspecified.
= first response after seconds.
= variable interval averaging seconds.
Study Hints
Diagram each schedule’s cumulative-response curve to visualise differences.
Practise constructing shaping plans: list at least 6–8 approximation steps.
Observe real-life examples (e.g. slot machines = VR, checking phone notifications = VI).
Summary
Reinforcement alters probability of operant behaviour; proper timing & contingency strength are critical.
Shaping + thoughtfully chosen schedules = humane, efficient training.
Always integrate motivation, sensory biases, and emotional wellbeing for successful learning and ethical practice.