Animal Learning – Reinforcement, Schedules, and Behaviour-Change Techniques

Learning Objectives

  • Introduce core learning-theory terminology and concepts for newcomers.

  • Encourage experienced learners to re-examine familiar material from fresh angles (e.g.
    considering ethology and emotion in addition to classic Pavlovian or Skinnerian views).

  • Equip you to:

    • Explain why animals alter behaviour after experience.

    • Plan scientifically sound training programmes (hands-on methodology is addressed later and in assessments).

Principles of Reinforcement

  • Reinforcement ≈ a change in the animal’s affective (emotional) state that makes the behaviour it follows more likely.

  • Punishment ≈ a shift in affect that makes the preceding behaviour less likely.

  • Key vocabulary:

    • Primary (unconditioned) reinforcer – innately valuable (e.g. food, water, warmth, social contact, pain relief).

    • Secondary (conditioned) reinforcer – gains value through pairing with a primary one (e.g. clicker sound, “good dog”).

  • Ethical & welfare angle:

    • Emphasise positive reinforcement to enhance welfare; minimise aversives.

    • Acknowledge that any procedure that manipulates affect has moral implications.

Reinforcement Schedules

  • Schedules = rules describing when each behavioural response will be reinforced.

Continuous Reinforcement (CRF)

  • Every occurrence of the target behaviour earns a reinforcer.

  • Advantages:

    • Rapid acquisition.

    • Clear information about contingency.

  • Disadvantages:

    • Rapid extinction\text{Rapid extinction} once reinforcement stops (animal notices immediately).

Partial / Intermittent Reinforcement

  • Only some responses are reinforced.

  • General pattern:

    • Slower acquisition\text{Slower acquisition} compared with CRF.

    • Far greater resistance to extinction; animal keeps “checking” if payoff will reappear.

  • Four canonical sub-types (Skinner, 1938):

    • Fixed Ratio (FR)

    • Reinforce after a set number of responses, e.g. FR-5 ⇒ every 5th response.

    • Generates high, steady rates; brief post-reinforcement pause.

    • Performance often described by a “stair-step” cumulative record.

    • Variable Ratio (VR)

    • Reinforce after an unpredictable number; centred on a mean, e.g. VR-20.

    • Produces the highest response rates; very hard to extinguish (gambles, slot-machines).

    • Fixed Interval (FI)

    • First response after a fixed time interval earns reinforcement, e.g. FI-30 s.

    • Typical scalloped pattern: slow just after reward, accelerating as interval ends.

    • Variable Interval (VI)

    • First response after a variable, unpredictable interval; mean value defined.

    • Produces slow, steady responding; common in natural foraging.

Differential Reinforcement (DR)

  • Values of rewards scale with quality of performance.

  • Example protocol:

    • Below-average response → no reward.

    • Average → low-value treat.

    • Above-average → medium treat.

    • Exceptional → high-value treat (e.g. steak cube).

  • Enhances precision and encourages continual improvement without aversives.

Shaping, Capturing & Luring

  • Shaping = reinforcing successive approximations until the full behaviour emerges.

    • More humane than force or flooding; leverages animal’s natural variability.

    • Tips for effectiveness:

    • Define the final objective behaviour precisely.

    • Break into small, achievable criteria; raise the criterion after 2–3 consecutive successes.

    • Keep rate of reinforcement high; avoid frustration.

    • If the animal stalls, back up to last successful step ("gradient of difficulty").

  • Capturing = waiting for the animal to spontaneously perform the behaviour, then marking & reinforcing.

  • Luring = using a visible reinforcer (or target) to guide the movement; fade lure promptly to prevent dependency.

  • Chinning (mentioned) often refers to gently guiding via manual contact (commonly in farm species training).

  • Extinction = withholding reinforcement so behaviour frequency diminishes; may provoke an extinction burst (temporary spike in intensity or variability).

Chaining

  • A chain = ordered sequence where each behaviour’s cue is the completion of the previous behaviour.

  • Two assembly strategies:

    • Forward Chaining

    • Teach behaviour A → reinforce.

    • Add behaviour B after A → reinforce, etc.

    • Natural for sequences that mirror real-world order.

    • Backward Chaining

    • Start with last behaviour (closest to primary reinforcer) — ensures every trial ends with success.

    • Add preceding links progressively (Z → reinforce; Y then Z → reinforce; X then Y then Z…).

    • Favoured in complex routines (e.g. guide-dog opening fridge → bring soda → sit politely).

  • Practical advice:

    • Task analysis – break the routine into clear, trainable components.

    • Decide whether to reinforce only after the full chain or maintain occasional within-chain rewards.

  • Widely utilised in advanced contexts: assistance-dog tasks, marine-mammal shows, equestrian liberty acts.

Discrimination & Generalisation

  • Discrimination = responding differently to stimuli that vary along a critical dimension.

    • Example: dog sits on “sit” cue but not on “down”.

    • Rooted in adaptive ecology – recognising food vs. toxin, friend vs. foe.

  • Generalisation = treating diverse stimuli with shared features as functionally similar.

    • Example: horse accepts various trailer types after trailer-loading training.

    • Promotes efficiency: no need to relearn every novel instance.

  • Trainers shape the stimulus control spectrum via deliberate exposure, differential reinforcement, and context variation.

Factors Affecting Learning

  • Motivational State

    • Hunger, thirst, social drive, pain relief, play motivation.

    • Under- or over-motivation leads to sub-optimal performance.

  • Biological Predispositions (“preparedness”)

    • Species-specific tendencies facilitate or hinder certain associations (e.g. rats readily link taste with nausea but not with electric shock location).

  • Environment

    • Distractors (noise, smells, movement) may compete for attention.

    • Anxiety / stress impairs cognitive processing; aim for Optimal Arousal Zone\text{Optimal Arousal Zone} (Yerkes–Dodson curve).

  • Age

    • Example: foals handled early show reduced neophobia later.

    • Senescent dogs often exhibit declines in working memory and learning speed; adjust session length & complexity.

Connections to Classical Conditioning & Emotion

  • Operant contingencies frequently overlap with classical (Pavlovian) conditioning.

    • A clicker becomes a secondary reinforcer through classical pairing with food.

  • Emotional states (fear, relief, anticipation) function as motivating operations; they enhance or diminish reinforcing power.

  • Integrating ethological context prevents contra-free loading (animals evolving to "work" for outcomes) conflicts.

Practical & Ethical Implications

  • Favour least-intrusive, minimally aversive (LIMA) principle.

  • Partial schedules & differential reinforcement reduce reliance on punishment while preserving precision.

  • Recognise and respect agency: allow the animal choice & control wherever feasible.

Numerical / Statistical References

  • Notation recap:

    • FR-n\text{FR-}n = reinforce each nthn^{\text{th}} response.

    • VR-n\text{VR-}n = mean of nn responses, variance unspecified.

    • FI-t\text{FI-}t = first response after tt seconds.

    • VI-t\text{VI-}t = variable interval averaging tt seconds.

Study Hints

  • Diagram each schedule’s cumulative-response curve to visualise differences.

  • Practise constructing shaping plans: list at least 6–8 approximation steps.

  • Observe real-life examples (e.g. slot machines = VR, checking phone notifications = VI).

Summary

  • Reinforcement alters probability of operant behaviour; proper timing & contingency strength are critical.

  • Shaping + thoughtfully chosen schedules = humane, efficient training.

  • Always integrate motivation, sensory biases, and emotional wellbeing for successful learning and ethical practice.