PSYC513/703: Language and Communication - Spoken Word Recognition

Contact Information

  • Email: Jeremy.Goslin@plymouth.ac.uk
  • Office: PSQ B221
  • Office Appointment: Tuesdays 3-4

Speech Variability

  • Variability in acoustic waveforms occurs due to:
    • Speaker rate
    • Intonation
    • Noise and distortion
    • Accent

Speech Units

  • Phones:
    • A basic unit of sound in speech.
    • A speech segment with distinct physical or perceptual properties.
    • Acoustic or articulatory distinction.
    • Acoustic phonetics (waveforms).
    • Articulatory phonetics (articulators).
    • Approximately 4000 available, over 800 in some languages.

Phonemes

  • Phones are a representation of speech production, not perception.
    • Allophones: Different phones that are perceptually equivalent in a language.
      • Example: /p/ in "pin" and "spin", /t/ in "night rate" and "nitrate".
  • Phonemes: Set of phones that are cognitively equivalent.
    • Phoneme = phone for me.
    • Basic unit that distinguishes words.
    • A change in phone will produce a different word or nonsense word.
    • Minimal pairs: /p/ and /b/ distinguish “pin” and “bin”.
    • Require a perceptual distinction
    • Language specific
    • British English has 44 phonemes: 20 vowels, 24 consonants
    • Abstract unit

Phoneme Examples

  • cat: k.æ.t
  • duck: d.ʌ.k
  • tiger: t.aɪ.g.ǝ
  • box: b.ɒ.k.s
  • cheese: ʧ.i:.z
  • book: b.ʊ.k
  • bubble: b.ʌ.b.ǝ.l

Top-Down Effects on Phoneme Perception

  • Phoneme perception is not simply a passive bottom-up process.
  • A phone can be perceived as one or another phoneme depending upon context.
  • Ganong effect (Ganong, 1980):
    • Categorical boundaries.
    • Voice onset time.
    • Modulated by context.
    • Example: DA TA DASH TASK

Phoneme Restoration Effect

  • Top-down effects (Warren, 1970):
    • Ambiguity?
      • "The _eel had a broken axle"
      • "The _eel on the orange was hard to cut"
    • Effects are stronger:
      • In words than non-words
      • Later in words
      • In strongly biasing contexts

McGurk Effect

  • Top-down effects (McGurk & MacDonald, 1976):
    • The effects of visual speech perception on the audio stream.
    • Multimodal speech perception.
    • Interference caused by seeing one phoneme and hearing another.

Active Speech Perception

  • Top-down effects are evidence that speech perception is an active rather than a passive process.
  • Combination of top-down and bottom-up information.
  • Allows us to overcome imperfect speech input.

Speech Segmentation

  • Continuous speech signal:
    • Dynamic waveform caused by continuous movement of articulators.
    • Boundaries not evident in the speech signal.
    • Word boundaries.

Shillcock and Tabossi (1990)

  • Evidence of continuous attempts at word segmentation of the speech stream.
  • Cross-modal priming
    • Hear sentences
    • Respond to written words
    • Lexical decision task
  • Primes:
    • The scientist made a new discovery last year
    • The scientist made a novel discovery last year
  • Target
    • Nudist primed by: The scientist made a new discovery last year
  • Priming effect caused by temporary segmentation error
  • No report of the perception of the word ‘Nudist’ in the prime
  • Evidence of continuous segmentation attempts

Lexical Access

  • Simple Theory
    • Match string of letters/phonemes/syllables to a word in the lexicon
    • Search
    • Organisation of dictionary
    • lexical acoustic house phonemic semantic /h/ /au/ /s/ happy jaw sad table how

Cohort Model (Marslen-Wilson & Welsh, 1978)

  • ACCESS STAGE (perceptual representation used to activate lexical items, thus generating a candidate set of items – the cohort)
  • SELECTION STAGE (the most likely candidate is chosen from cohort)
  • INTEGRATION STAGE (in which the semantic and syntactic properties of the chosen words are utilized)

Examples of Cohort

  • S: song, story, sparrow, saunter, slow, secret, sentry, etc.
  • SP: spice, spoke, spare, spin, splendid, spelling, spread, etc.
  • SPI: spit, spigot, spill, spiffy, spinaker, spirit, spin, etc.
  • SPIN: spin, spinach, spinster, spinaker, spindle
  • SPINA: word uniqueness point spinach

Word Recognition

  • Word recognition is fast
    • Shadowing and word-monitoring tasks: latencies of 250-275 msec
    • Intuitively immediate - words are recognized before end of word is reached
  • Uniqueness point .. or even before Evidence from Gating (Grosjean, 1980)
    • presented with fragments of a word with gradually increasing duration t - tr - tre - tress - tresp – trespa
    • The point at which the person guesses the whole word is called the isolation point
  • Average recognition times
    • Out of context: 300-350ms
    • In context: 200ms
  • Top-down effects
    • Ganong, Phonemic restoration, McGurk etc.
  • Speed and robustness depends on words in context
    • sentence --> word context effects
  • System actively seeks matches to input - does not wait for complete match

Lexical Decision

  • Press a button when a presented stimulus is a real word:
    • Words vs non-words
    • Spinach Splinger
      • Fast response = easy access 400ms400 ms
      • Slow response = hard access 500ms500 ms

Factors Affecting Lexical Decision Times

  • Word Length
  • Word frequency
    • High frequency words = common words (“cat, mother, house”)
    • Low frequency words = uncommon words (“accordion, compass”)
  • Uniqueness point
    • early uniqueness point = strawberry (there are no other English words beginning with ”strawb”
    • late uniqueness point = blackberry (not unique at /b/ of berry; blackbird, blackbeetle,…)
  • Neighbourhood
    • yacht peach
      • Both high- FAST frequency SLOW
      • peach has lots of high- frequency neighbours (e.g. reach, peace, beach, pea)

Problems with Cohort

  • Not robust to distortion of initial phonemes
    • e.g. “shigarette”
    • Ganong effect for initial, as well an non-initial phonemes
  • Lexical decision latencies are proportional to frequency-weighted neighborhood size, not merely to cohort size.
    • Marslen-Wilson: auditory lexical decision task with word pairs with matched uniqueness points
    • e.g. DIFFIC | ULThigh frequency (250ms) DIFFID | ENTlow frequency (379ms)
  • Requires segmentation (i.e., location of word onset) before word identification can begin
  • Not robust to segmentation errors
    • The sky is falling This guy is falling

TRACE (Interactive Activation) Model (McClelland & Elman, 1986)

  • TRACE has three sets of interconnected detectors
    • Feature detectors
    • Phoneme detectors
    • Word detectors
  • Within a set (or level) connections are inhibitory
    • e.g. evidence that a certain stretch of the input is the word “tip” is evidence that it is NOT any other word
  • Between a set (or level) connections are excitatory
    • E.g. evidence that a certain stretch of the input is the sound /t/ is evidence that it might be the beginning of the word “tip”

TRACE Example

  • Speech Signal Features Phonemes Words /l/ /d/ /k/ lick lad - - - + + + lip /a/ /p/ /i/ fat

TRACE - Lexical Activation

  • Stimulus: LICK LIP
  • Activation Competition Selection/Recognition
    (e.g. Luce et al. 1990, Norris 1994)

Evidence Supporting TRACE

  • TRACE is broadly compatible with lexical effects on phoneme identification, explaining them in terms of feedback from the lexical level to the phonemic level
    • Ganong effect
    • Phonemic Restoration Effect
  • TRACE recognizes words even if the initial phoneme is distorted or ambiguous
  • Can find word boundaries
  • Problems…
    • requires massive duplication of units and connections, copying over and over again the connection patterns that determine which features activate which phonemes and which phonemes activate which words

Lecture 2 Summary

  • Speech variability and the need for abstract units: phonemes
  • Top-down effects in speech perception
  • Ganong effect
  • Phoneme restoration
  • McGurk Effect
  • Segmentation
  • Lexical access
  • Cohort
  • Trace

Reading

  • The psychology of language : from data to theory by Trevor A. Harley
  • Chapter 9: Understanding Speech