PSYC513/703: Language and Communication - Spoken Word Recognition
- Email: Jeremy.Goslin@plymouth.ac.uk
- Office: PSQ B221
- Office Appointment: Tuesdays 3-4
Speech Variability
- Variability in acoustic waveforms occurs due to:
- Speaker rate
- Intonation
- Noise and distortion
- Accent
Speech Units
- Phones:
- A basic unit of sound in speech.
- A speech segment with distinct physical or perceptual properties.
- Acoustic or articulatory distinction.
- Acoustic phonetics (waveforms).
- Articulatory phonetics (articulators).
- Approximately 4000 available, over 800 in some languages.
Phonemes
- Phones are a representation of speech production, not perception.
- Allophones: Different phones that are perceptually equivalent in a language.
- Example: /p/ in "pin" and "spin", /t/ in "night rate" and "nitrate".
- Phonemes: Set of phones that are cognitively equivalent.
- Phoneme = phone for me.
- Basic unit that distinguishes words.
- A change in phone will produce a different word or nonsense word.
- Minimal pairs: /p/ and /b/ distinguish “pin” and “bin”.
- Require a perceptual distinction
- Language specific
- British English has 44 phonemes: 20 vowels, 24 consonants
- Abstract unit
Phoneme Examples
- cat: k.æ.t
- duck: d.ʌ.k
- tiger: t.aɪ.g.ǝ
- box: b.ɒ.k.s
- cheese: ʧ.i:.z
- book: b.ʊ.k
- bubble: b.ʌ.b.ǝ.l
Top-Down Effects on Phoneme Perception
- Phoneme perception is not simply a passive bottom-up process.
- A phone can be perceived as one or another phoneme depending upon context.
- Ganong effect (Ganong, 1980):
- Categorical boundaries.
- Voice onset time.
- Modulated by context.
- Example: DA TA DASH TASK
Phoneme Restoration Effect
- Top-down effects (Warren, 1970):
- Ambiguity?
- "The _eel had a broken axle"
- "The _eel on the orange was hard to cut"
- Effects are stronger:
- In words than non-words
- Later in words
- In strongly biasing contexts
McGurk Effect
- Top-down effects (McGurk & MacDonald, 1976):
- The effects of visual speech perception on the audio stream.
- Multimodal speech perception.
- Interference caused by seeing one phoneme and hearing another.
Active Speech Perception
- Top-down effects are evidence that speech perception is an active rather than a passive process.
- Combination of top-down and bottom-up information.
- Allows us to overcome imperfect speech input.
Speech Segmentation
- Continuous speech signal:
- Dynamic waveform caused by continuous movement of articulators.
- Boundaries not evident in the speech signal.
- Word boundaries.
Shillcock and Tabossi (1990)
- Evidence of continuous attempts at word segmentation of the speech stream.
- Cross-modal priming
- Hear sentences
- Respond to written words
- Lexical decision task
- Primes:
- The scientist made a new discovery last year
- The scientist made a novel discovery last year
- Target
- Nudist primed by: The scientist made a new discovery last year
- Priming effect caused by temporary segmentation error
- No report of the perception of the word ‘Nudist’ in the prime
- Evidence of continuous segmentation attempts
Lexical Access
- Simple Theory
- Match string of letters/phonemes/syllables to a word in the lexicon
- Search
- Organisation of dictionary
- lexical acoustic house phonemic semantic /h/ /au/ /s/ happy jaw sad table how
Cohort Model (Marslen-Wilson & Welsh, 1978)
- ACCESS STAGE (perceptual representation used to activate lexical items, thus generating a candidate set of items – the cohort)
- SELECTION STAGE (the most likely candidate is chosen from cohort)
- INTEGRATION STAGE (in which the semantic and syntactic properties of the chosen words are utilized)
Examples of Cohort
- S: song, story, sparrow, saunter, slow, secret, sentry, etc.
- SP: spice, spoke, spare, spin, splendid, spelling, spread, etc.
- SPI: spit, spigot, spill, spiffy, spinaker, spirit, spin, etc.
- SPIN: spin, spinach, spinster, spinaker, spindle
- SPINA: word uniqueness point spinach
Word Recognition
- Word recognition is fast
- Shadowing and word-monitoring tasks: latencies of 250-275 msec
- Intuitively immediate - words are recognized before end of word is reached
- Uniqueness point .. or even before Evidence from Gating (Grosjean, 1980)
- presented with fragments of a word with gradually increasing duration t - tr - tre - tress - tresp – trespa
- The point at which the person guesses the whole word is called the isolation point
- Average recognition times
- Out of context: 300-350ms
- In context: 200ms
- Top-down effects
- Ganong, Phonemic restoration, McGurk etc.
- Speed and robustness depends on words in context
- sentence --> word context effects
- System actively seeks matches to input - does not wait for complete match
Lexical Decision
- Press a button when a presented stimulus is a real word:
- Words vs non-words
- Spinach Splinger
- Fast response = easy access 400ms
- Slow response = hard access 500ms
Factors Affecting Lexical Decision Times
- Word Length
- Word frequency
- High frequency words = common words (“cat, mother, house”)
- Low frequency words = uncommon words (“accordion, compass”)
- Uniqueness point
- early uniqueness point = strawberry (there are no other English words beginning with ”strawb”
- late uniqueness point = blackberry (not unique at /b/ of berry; blackbird, blackbeetle,…)
- Neighbourhood
- yacht peach
- Both high- FAST frequency SLOW
- peach has lots of high- frequency neighbours (e.g. reach, peace, beach, pea)
Problems with Cohort
- Not robust to distortion of initial phonemes
- e.g. “shigarette”
- Ganong effect for initial, as well an non-initial phonemes
- Lexical decision latencies are proportional to frequency-weighted neighborhood size, not merely to cohort size.
- Marslen-Wilson: auditory lexical decision task with word pairs with matched uniqueness points
- e.g. DIFFIC | ULThigh frequency (250ms) DIFFID | ENTlow frequency (379ms)
- Requires segmentation (i.e., location of word onset) before word identification can begin
- Not robust to segmentation errors
- The sky is falling This guy is falling
TRACE (Interactive Activation) Model (McClelland & Elman, 1986)
- TRACE has three sets of interconnected detectors
- Feature detectors
- Phoneme detectors
- Word detectors
- Within a set (or level) connections are inhibitory
- e.g. evidence that a certain stretch of the input is the word “tip” is evidence that it is NOT any other word
- Between a set (or level) connections are excitatory
- E.g. evidence that a certain stretch of the input is the sound /t/ is evidence that it might be the beginning of the word “tip”
TRACE Example
- Speech Signal Features Phonemes Words /l/ /d/ /k/ lick lad - - - + + + lip /a/ /p/ /i/ fat
TRACE - Lexical Activation
- Stimulus: LICK LIP
- Activation Competition Selection/Recognition
(e.g. Luce et al. 1990, Norris 1994)
Evidence Supporting TRACE
- TRACE is broadly compatible with lexical effects on phoneme identification, explaining them in terms of feedback from the lexical level to the phonemic level
- Ganong effect
- Phonemic Restoration Effect
- TRACE recognizes words even if the initial phoneme is distorted or ambiguous
- Can find word boundaries
- Problems…
- requires massive duplication of units and connections, copying over and over again the connection patterns that determine which features activate which phonemes and which phonemes activate which words
Lecture 2 Summary
- Speech variability and the need for abstract units: phonemes
- Top-down effects in speech perception
- Ganong effect
- Phoneme restoration
- McGurk Effect
- Segmentation
- Lexical access
- Cohort
- Trace
Reading
- The psychology of language : from data to theory by Trevor A. Harley
- Chapter 9: Understanding Speech