SR

Week 4 Associative Learning Theories

The Role of Theories and Models in Research

  • Models are used to construct approximate theories, perform experiments and simulations, and compare experimental results, simulation results, and theoretical predictions.

  • The goal is to compare and improve the model and theory.

Bush and Mosteller Model (1955)

  • Assumes stimuli gain association with the US based solely on contiguity.

  • Learning occurs according to a local error reduction rule.

  • Learning occurs to the extent that the CS does not predict the US accurately.

  • The goal of learning is to reduce predictive error.

  • When the CS fully and completely predicts the US, then no more learning occurs.

Acquisition Curve

  • Shows associative value increasing over trials as predictive error decreases.

Local Error Reduction Rule

  • ∆V = (λ - Vn)

    • λ refers to the total amount of conditioning that a US can support on a given trial.

    • V refers to the associative strength of the particular cue in question.

    • n refers to the trial number.

    • (λ - V) = predictive error.

    • Note: λ = 1 when US is present; λ = 0 when US is absent.

    • Goal of learning is to get ∆V = 0 → no more predictive error (Occurrence of US is surprising)

Acquisition Example:
  • ∆V = (1 – 0.01) à predictive error is 1

  • ∆V = (1 – 0.12) à predictive error is 0.9

  • ∆V = (1 – 0.23) à predictive error is 0.8

  • ∆V = (1 – 1.030) à predictive error is 0 à Goal of learning has been achieved, the occurrence of the US is no longer surprising. à Goal of learning is to get ∆V = 0 à no more predictive error. à the occurrence of the US is surprising.

Extinction Example:
  • ∆V = (0 – 1.01) à predictive error is -1.0

  • ∆V = (0 – 0.92) à predictive error is -0.9

  • ∆V = (0 – 0.83) à predictive error is -0.8

  • ∆V = (0 – 0.030) à predictive error is 0.0 à Goal of learning has been achieved, the absence of the US is no longer surprising. à Goal of learning is to get ∆V = 0 à no more predictive error à the absence of the US is surprising

Key Points:
  • The amount and type of learning that occurs to a CS depends on how surprising the US is based on that CS alone.

  • Most learning occurs in early trials and grows at a negatively accelerated rate.

  • When US is no longer surprising, learning reaches asymptote.

Limitations:
  • Cannot explain blocking (A+ | AX+) in which there is perfect contiguity between X and the US but little responding is observed relative to the control group.

  • Also cannot explain overshadowing (AX+), which also has perfect contiguity between A and X and the US but less responding is observed relative to elemental training.

Rescorla and Wagner Model (1972)

  • Uses total error reduction instead of local error reduction.

  • Learning depends on the level of surprise of the US based on the predictive value of ALL cues present on a given trial (local error reduction is based on the predictive value of only the CS being tested).

  • Satisfactorily accounts for most cue competition situations including blocking and overshadowing.

Learning Rule

  • ∆V = αβ(λ - ∑Vn)

    • ∑V (also noted as VT or V(present cues)) refers to the total associative strength of all CSs present on trial number n.

    • α is the learning rate parameter that refers to the associability of the CS.

      • β is the learning rate parameter that refers to the associability of the US.

      • Associability roughly corresponds to salience or intensity.

  • 0 < α < 1

  • 0 < β < 1

Total Error Reduction Rule

  • Formula: ∆V = (λ – ∑Vn)

Example:
  • ∆V = (λ – [V1 + V2]n)

  • ∆V = (1 – [0.0 + 0.0]1) à predictive error is 1.0

  • ∆V = (1 – [0.1 + 0.1]2) à predictive error is 0.8

  • ∆V = (1 – [0.2 + 0.2]3) à predictive error is 0.6

  • ∆V = (1 – [0.3 + 0.3]4) à predictive error is 0.4

  • ∆V = (1 – [0.4 + 0.4]5) à predictive error is 0.2

  • ∆V = (1 – [0.5 + 0.5]6) à predictive error is 0.0

  • Goal of learning has been achieved, the occurrence of the US is no longer surprising

  • Goal of learning is to get ∆V = 0 à no more predictive error

  • When trained in compound, the associative value of all CSs cannot exceed the value of the US

Conditioned Inhibition

  • X acquired negative associative strength to balance out A’s excitatory associative strength.

    • X = -1; A = +1

  • On the negative summation test, B = 1 and X = -1

  • On the retardation test, X = -1 and Y = 0

Training Paradigms for Conditioned Inhibition (CI)
  • CI Train: A+ / AX-

  • Transfer Train: B+

  • Neg Sum Test: BX à cr

  • Ret Test: X+ à cr

Control

  • A+ / AX-

  • B+

  • B à CR

  • Y+ à CR

Formula for Conditioned Inhibitor
  • ∆V = (λ – ∑Vn)

  • ∆V = (λ – [A + X]n)

  • ∆V = (0 – [1.0 + (0.0)]1) à predictive error is -1.0

  • ∆V = (0 – [1.0 + (-0.1)]2) à predictive error is -0.9

  • ∆V = (0 – [1.0 + (-0.2)]3) à predictive error is -0.8

  • ∆V = (0 – [1.0 + (-0.3)]4) à predictive error is -0.7

  • ∆V = (0 – [1.0 + (-0.4)]5) à predictive error is -0.6

  • ∆V = (0 – [1.0 + (-1.0)]30) à predictive error is 0.0

Overexpectation

  • Reinforcing two independently-trained conditioned excitors in compound will result in less responding to each CS individually relative to if you reinforced the CS with a neutral stimulus (Con1) or reinforced the two CSs elementally (Con2).

  • The animal expected λ = 2 during AX trials but only received λ = 1.

  • X acquires negative associative strength.

Experimental Design:
  • Group: Phase 1 / Phase 2 / Test

    • Exp: A+ / X+ then AX+ then X à cr

    • Con1: B+ / X+ then AX+ then X à CR

    • Con2: A+ / X+ then A+ / X+ then X à CR

Total Error Reduction Rule in Overexpectation:
  • ∆V = (λ – ∑Vn)

  • ∆V = (λ – [A + X]n)

  • ∆V = (1 – [1.0 + 1.0]1) à predictive error is -1.0

  • ∆V = (1 – [0.9 + 0.9]2) à predictive error is -0.8

  • ∆V = (1 – [0.8 + 0.8]3) à predictive error is -0.6

  • ∆V = (1 – [0.7 + 0.7]4) à predictive error is -0.4

  • ∆V = (1 – [0.6 + 0.6]5) à predictive error is -0.2

  • ∆V = (1 – [0.5 + 0.5]6) à predictive error is 0.0

  • Presenting either CS A or X by itself will now yield a smaller conditioned response (cr)

  • Goal of learning is to get ∆V = 0 à no more predictive error

  • When trained in compound, the associative value of all CSs cannot exceed the value of the US

Blocking

  • Total error reduction rule in blocking (assume A was a previously trained excitor with 0.8 value):

  • ∆V = (λ – ∑Vn)

  • ∆V = (λ – [A + X]n)

  • ∆V = (1 – [0.8 + 0.0]1) à predictive error is 0.2

  • ∆V = (1 – [0.8 + 0.1]2) à predictive error is 0.1

  • ∆V = (1 – [0.8 + 0.2]3) à predictive error is 0.0

  • ∆V = (1 – [0.8 + 0.2]4) à predictive error is 0.0

  • ∆V = (1 – [0.8 + 0.2]5) à predictive error is 0.0

  • ∆V = (1 – [0.8 + 0.2]6) à predictive error is 0.0

  • Previously trained excitor Cue A has blocked Cue X from gaining any more excitatory value

  • Goal of learning is to get ∆V = 0 à no more predictive error

  • When trained in compound, the associative value of all CSs cannot exceed the value of the US

Overshadowing

  • Total error reduction rule in overshadowing (assume A is more salient):

  • ∆V = (λ – ∑Vn)

  • ∆V = (λ – [A + X]n)

  • ∆V = (1 – [0.0 + 0.0]1) à predictive error is 1.0

  • ∆V = (1 – [0.4 + 0.1]2) à predictive error is 0.5

  • ∆V = (1 – [0.8 + 0.2]3) à predictive error is 0.0

  • ∆V = (1 – [0.8 + 0.2]4) à predictive error is 0.0

  • ∆V = (1 – [0.8 + 0.2]5) à predictive error is 0.0

  • ∆V = (1 – [0.8 + 0.2]6) à predictive error is 0.0

  • More salient Cue A has overshadowed not-so-salient Cue X from gaining any more excitatory value

  • Goal of learning is to get ∆V = 0 à no more predictive error

  • When trained in compound, the associative value of all CSs cannot exceed the value of the US

Problems with Rescorla-Wagner Model

  • Spontaneous recovery

  • Latent inhibition (CS pre-exposure)

Extinction of Inhibition
  • Pav C.I. Train: A+ / AX-

  • CI Ext: X-

  • Test

    • Exp: X à Con

    • Con: X à cr

  • Rescorla-Wagner model during Phase CI Ext: ∆V = (0 – [-1.0]1) à predictive error is 1.0.

Nonreinforcement of Neutral Cue in Presence of Conditioned Inhibitor
  • PavCI Train: A+ / AX-

  • Acq: XY-

  • Test

    • Exp: Y à Con

    • Con: Y à cr

  • R-W model during Phase Acq: ∆V = (λ – [X + Y]n)

  • ∆V = (0 – [-1.0 + 0.0]1) à predictive error is 1.0

  • ∆V = (0 – [-1.0 + 0.1]2) à predictive error is 0.9

  • ∆V = (0 – [-1.0 + 0.2]3) à predictive error is 0.8

  • ∆V = (0 – [-1.0 + 1.0]30) à predictive error is 0.0

Retrospective Revaluation
  • Change in response to the target CS as a function of manipulating the associative status of a related CS.

Example:

  • Phase 1: A+

  • Phase 2: AX+

  • Phase 3: A-

  • Test: X à CR

The Rescorla-Wagner model during Phase 2:

  • ∆V = (1 – [0.8 + 0.2]30) à predictive error is 0.0

  • Not predicted by R-W model

  • Rescorla-Wagner model assumes that changes in the associative status of the CS occur ONLY when the CS is present, i.e., α > 0.

  • Retrospective revaluation shows changes in the associative status of the target CS when it is absent, i.e., α = 0.

Miller's Comparator Hypothesis (work on this topic more) look at results from tut 4

  • Non-competitive learning.

  • Contiguity is necessary and sufficient for learning.

    • We associate anything that is presented together (i.e., has good contiguity), but we don’t express all learning.

  • The predictive value of a CS is compared to all other stimuli it is associated with, which are also associated with the US.

    • Other CSs are called “comparator stimuli”.

  • Behaviour dependent on relative status of stimuli at time of testing.

  • If the target CS (X) is more strongly associated with the US than other comparator stimuli, then responding to X is strong.

  • If another comparator stimulus (Y) is more strongly with the US than target CS X, then responding to X is weak.

  • Which CS is more strongly associated (and therefore will more strongly control behaviour) is determined by the comparator process.

Comparator Process

  • Involves direct and indirect activation of the US representation.

Simple Acquisition Trial (X+)
  • Target CS (X) directly activates US representation. (Link1)

  • Comparator stimulus/representation (Link2) context indirectly activates US representation.(Link3)

  • Comparator term calculated via Link 2 x Link 3.

Weak Excitatory CR or Conditioned Inhibition
  • Phase 1: 5XA+

  • Phase 2: 20A+

  • Link 1 does not get weaker, but it is weak relative to the comparator term (links 2 and 3).

Role of Context
  • The context, due to its lower salience, becomes a second-order comparator stimulus.

Retrospective Revaluation: Recovery from Blocking

  • Phase 1: A+

  • Phase 2: AX+

  • Phase 3: A-

  • Test: X à Cr

Mechanism
  • X: Directly activated US representation

  • A: Comparator process

  • Indirectly activated US representation via Links 1, 2, and 3

Retrospective Revaluation: Recovery from Overshadowing

  • Phase 1: AX+

  • Phase 2: A-

  • Test: X à Cr

    • Directly activated US representation

    • Comparator process

  • Indirectly activated US representation

Models of Attention

  • Mackintosh's (1975) Model

  • Pearce & Hall's (1980) Model

  • The outcome of a trial will determine how much attention is given to the CS on the next trial.

  • Attention modulates learning.

Pearce & Hall (1980)

  • Proposed that attention is determined by how surprising the US was on the preceding trial.

  • More surprising means more attention is paid on the subsequent trial.

Mackintosh (1975)

  • Proposed that attention increases to cues that are reliable predictors of the US.

Hogarth, Dickinson, & Duka (2011)

  • Looking for action – attention that a stimulus commands after it has become a good predictor of the US and can generate a CR with minimal cognitive effort.

    • Similar to Mackintosh’s attentional mechanism.

  • Looking for learning – attention that is involved in processing cues that are not yet good predictors of the US and therefore have much to be learned about.

    • Similar to Pearce & Hall’s attentional mechanism.

  • Looking for liking – attention that stimuli command because of their emotional value.

Timing Models

  • The CS-US interval (ISI) is important for strength and rate of learning and CR (think about trace conditioning).

    • When something occurs is equally as important as what occurs.

  • The intertrial interval (ITI) can influence conditioning.

    • Generally, longer ITIs lead to better conditioning than massed trials (i.e., short ITIs).

  • ITI and ISI interact to determine responding.

    • Responding is determined by the length of the ISI relative to the length of the ITI.

  • Not absolute value.

Relative Waiting Time Hypothesis

  • Organisms compare how long they have to wait for the US during the CS (T) relative to how long they wait for the US during the intertrial interval (I).

High I/T Ratio
  • When the US waiting time during the CS is shorter than during the ITI, the I/T ratio is high.

    • The CS becomes an informative predictor of the next occurrence of the US.

    • Strong responding to the CS is observed.

Low I/T Ratio
  • When the US waiting time during the CS is longer than or similar to during the ITI, the I/T ratio is low.

  • The CS provides little information about the next US.

  • Weak responding to the CS is observed.

  • The idea that organisms compare the relative predictiveness/informative value of the CS and ITI at the time of testing is similar to the idea of the Comparator Hypothesis.

Temporal Coding Hypothesis

  • Time is encoded as part of the association between two stimuli.

    • We know this from observation of inhibition of delay.

  • Studies suggest that organisms do acquire CS-US associations in simultaneous and backward conditioning procedures.

    • But this learning is only expressed in a predictive relationship.

    • Responding to a CS must be assessed by an anticipatory measure.

Use of Secondary Order Conditioning (SOC) and Sensory Preconditioning (SPC) to test backward conditioning.
Temporal Coding in Pavlovian Conditioning

Summary

  • Associative learning theories are simplified models of how we learn and how that learning determines behaviour.

  • Rescorla & Wagner assumes learning occurs in accordance to how surprising the outcome is based on the associative value of all stimuli present.

  • Comparator Hypothesis is a model of performance. It assumes that stimuli compete at the time of testing for behavioural control based on their relative informative value.

  • Attention models assume that the outcome of a trial will determine attention, which modulates learning on subsequent trials.

  • Timing models incorporate time into the learned association, which influences behaviour.