knowt logo

Speech Perception

Challenges in Speech Perception

 

  • Continuous Nature of Speech: Unlike written text, speech has no clear word boundaries.

  • Variability in Speech Sounds:

    • Co-articulation: Sounds are influenced by surrounding phonemes (e.g., "the" sounds different depending on context).

    • Speaker differences: Variations due to accent, gender, and speaking rate.

  • Time Constraints:

    • Humans process up to 200 words per minute.

    • The fleeting nature of sound creates a "now-or-never bottleneck" for processing (Christiansen & Chater, 2016).

 

What Are Speech Sounds?

 

  • Phonemes:

    • Smallest units of speech that carry meaning (e.g., /p/ in "pin" vs. /b/ in "bin").

    • Not the same as letters (e.g., the phoneme /k/ applies to both "cat" and "kite").

  • Phonemes are central to language comprehension and production.

 

Source-Filter Theory of Speech Production

 

  • Source:

    • Sound originates in the larynx as vibrations of the vocal cords.

    • Determines pitch and intonation.

  • Filter:

    • Shapes the sound through the supralaryngeal vocal tract (e.g., lips, tongue, teeth, oral/nasal cavities).

    • Produces distinct speech sounds.

 

Analysing Speech Sounds

 

  • Spectrograms:

    • Visual representation of sound amplitude over time and frequency.

    • Formants: Bands of energy at specific frequencies shaped by the vocal tract.

    • The lowest three formants (F1, F2, F3) are most important for intelligibility.

 

Vowels and Consonants

 

  • Vowels:

    • F1 increases from high to low vowels (e.g., "heed" to "hod").

    • F2 decreases from front to back vowels (e.g., "heed" to "had").

  • Consonants:

    • F2 and F3 are critical for identifying different consonants.

 

Speech Perception

 

Categorical Perception

  • Gradual acoustic changes are perceived in discrete categories (e.g., /ba/ vs. /da/).

  • Key Features:

1.    Abrupt identification changes at phoneme boundaries.

2.    Discrimination peaks at phoneme boundaries.

3.    Discrimination depends on phoneme categorization.

 

Examples of Categorical Perception

 

  • The "Yanny vs. Laurel" debate highlights how individuals perceive speech differently based on categorical boundaries.

 

Contextual Influences on Speech Perception

 

1.    Visual Context:

o   McGurk Effect (McGurk & MacDonald, 1976): Visual input alters auditory perception (e.g., seeing lips form /ga/ while hearing /ba/ results in perceiving /da/).

2.    Lexical Context:

o   Ganong Effect (Ganong, 1980): Phoneme perception is biased by surrounding lexical information.

§  Example: Ambiguous sounds on a "gift" to "kift" continuum are interpreted based on familiar words.

 

Key Points

 

1.    Speech Production:

o   The source-filter theory explains how vocal cords and the vocal tract shape speech sounds.

2.    Categorical Perception:

o   Speech is perceived categorically despite graded acoustic inputs.

3.    Contextual Influences:

o   Speech perception is shaped by visual and lexical cues, demonstrating the brain's reliance on multi-sensory and contextual information.

Speech Production

 

  • Source-Filter Theory

    • Source: Vocal cords produce vibrations (important for pitch/intonation).

    • Filter: Supralaryngeal vocal tract shapes sounds into distinguishable phonemes (involves lips, tongue, teeth, oral/nasal cavities).

    • Evidence from MRI scans highlights these vocal movements.

  • Spectrogram Analysis:

    • Visualizes speech amplitude over time and frequency.

    • Formants: Energy bands shaped by vocal tract; critical for intelligibility (e.g., F1, F2, F3).

  • Vowels:

    • High-to-low vowels (e.g., “heed” → “hod”): F1 increases.

    • Front-to-back vowels (e.g., “heed” → “had”): F2 decreases.

  • Consonants:

    • Recognition involves F2 and F3 (e.g., “pem,” “ten,” “keng”).

 

Speech Perception

 

  • Phonemes:

    • Smallest meaningful units of sound; distinguished by changes in sound pressure from vocal movements.

    • Examples: /p/ in "pin" vs. /b/ in "bin".

  • Categorical Perception:

    • Sensory changes perceived discretely rather than continuously.

    • Key features:

      1. Abrupt identification change at phoneme boundaries.

      2. Discrimination peaks at boundaries.

      3. Discrimination aligns with identification (e.g., only "different" if phonemes differ).

  • Contextual Effects:

    • Auditory context: Stimuli like "Yanny vs. Laurel."

    • Visual context: The McGurk effect – mismatched audio-visual cues altering perception (e.g., hearing “ba” while seeing “ga”).

    • Lexical context: The Ganong effect – lexical knowledge biases phoneme interpretation (e.g., "gift" vs. "kift").

 

Key Takeaways

 

  1. Speech production involves a distinct source and filter system, each contributing unique perceptual elements.

  2. Speech perception is inherently categorical despite continuous acoustic signals.

  3. Context (visual, lexical, auditory) plays a significant role in interpreting speech sounds.

 

 

 

Motor Theory of Speech Perception

 

  • Proposed by: Alvin Liberman.

  • Core Idea:

    • A specialized speech module processes speech sounds separately from non-speech sounds.

    • Speech perception relies on identifying intended vocal gestures rather than acoustic signals.

    • Example: The phoneme /p/ in "pin" and "spin" differs acoustically but involves the same lip gesture.

  • Evidence Supporting the Theory:

    • fMRI studies: Listening to meaningless syllables activates motor and premotor areas (Wilson et al., 2004).

    • Transcranial Magnetic Stimulation (TMS): Disruption of premotor areas impairs phoneme discrimination in noise (Meister et al., 2007).

  • Criticism:

    • Non-speech sounds (e.g., musical intervals) also show categorical perception (Burns & Ward, 1978).

    • Animals like chinchillas exhibit similar phoneme boundaries, suggesting speech perception isn’t uniquely human (Kuhl & Miller, 1978).

 

Neural Basis of Speech Perception

 

  • Classic Model:

    • 19th-century neurologists (Broca, Wernicke, Lichtheim) linked specific brain areas to speech functions:

      • Wernicke’s area: Speech perception.

      • Broca’s area: Speech production.

    • Left-hemisphere dominance.

  • Modern Dual Streams Model (Hickok & Poeppel, 2007):

    • Ventral Stream:

      • Processes word recognition ("What does this word mean?").

      • Bilateral involvement (both hemispheres).

      • Damage in anterior/inferior temporal regions impairs semantic understanding.

    • Dorsal Stream:

      • Links speech perception with production.

      • Key for phoneme discrimination ("Is it /ba/ or /da/?").

      • Critical in learning new languages and continued speech processing in adulthood.

      • Primarily left-hemisphere dominant.

 

Word Recognition

 

Cohort Model (Marslen-Wilson & Tyler, 1981)

  • Key Idea:

    • Words are recognized progressively as they unfold over time.

    • Initial speech sounds activate a cohort of potential words.

    • Recognition occurs at the uniqueness point (UP) when only one word matches the input.

  • Features:

    • Words are activated immediately upon minimal input.

    • Competing words create "lexical competition."

  • Evidence:

    • Shadowing task: Average response latency (~250 ms) suggests word recognition before hearing full word.

  • Limitations:

    • Verbal nature of the model makes quantitative evaluation challenging.

    • Difficult to simulate computationally.

 

TRACE Model (McClelland & Elman, 1986)

 

  • Core Concept:

    • Speech perception involves a connectionist framework with interactive levels:

      • Phonemessyllables, and words.

    • Feedback loops allow higher-level word recognition to influence lower-level phoneme perception.

  • Strengths:

    • Explains context effects like the Ganong effect (lexical knowledge biases phoneme perception).

    • Supported by eye-tracking studies showing real-time processing adjustments.

  • Criticism:

    • Top-down feedback explanations have been contested (e.g., Norris et al., 2000).

 

Key Takeaways

 

  1. The motor theory's idea of involving motor representations in speech perception is partially supported but not fully validated.

  2. Dual streams in the brain specialize in word recognition (ventral stream) and linking perception with production (dorsal stream).

  3. Word recognition involves competition and activation processes, as explained by Cohort and TRACE models.

  4. Context effects in speech perception remain a hotly debated topic, particularly in TRACE's framework.

 

Speech Perception

Challenges in Speech Perception

 

  • Continuous Nature of Speech: Unlike written text, speech has no clear word boundaries.

  • Variability in Speech Sounds:

    • Co-articulation: Sounds are influenced by surrounding phonemes (e.g., "the" sounds different depending on context).

    • Speaker differences: Variations due to accent, gender, and speaking rate.

  • Time Constraints:

    • Humans process up to 200 words per minute.

    • The fleeting nature of sound creates a "now-or-never bottleneck" for processing (Christiansen & Chater, 2016).

 

What Are Speech Sounds?

 

  • Phonemes:

    • Smallest units of speech that carry meaning (e.g., /p/ in "pin" vs. /b/ in "bin").

    • Not the same as letters (e.g., the phoneme /k/ applies to both "cat" and "kite").

  • Phonemes are central to language comprehension and production.

 

Source-Filter Theory of Speech Production

 

  • Source:

    • Sound originates in the larynx as vibrations of the vocal cords.

    • Determines pitch and intonation.

  • Filter:

    • Shapes the sound through the supralaryngeal vocal tract (e.g., lips, tongue, teeth, oral/nasal cavities).

    • Produces distinct speech sounds.

 

Analysing Speech Sounds

 

  • Spectrograms:

    • Visual representation of sound amplitude over time and frequency.

    • Formants: Bands of energy at specific frequencies shaped by the vocal tract.

    • The lowest three formants (F1, F2, F3) are most important for intelligibility.

 

Vowels and Consonants

 

  • Vowels:

    • F1 increases from high to low vowels (e.g., "heed" to "hod").

    • F2 decreases from front to back vowels (e.g., "heed" to "had").

  • Consonants:

    • F2 and F3 are critical for identifying different consonants.

 

Speech Perception

 

Categorical Perception

  • Gradual acoustic changes are perceived in discrete categories (e.g., /ba/ vs. /da/).

  • Key Features:

1.    Abrupt identification changes at phoneme boundaries.

2.    Discrimination peaks at phoneme boundaries.

3.    Discrimination depends on phoneme categorization.

 

Examples of Categorical Perception

 

  • The "Yanny vs. Laurel" debate highlights how individuals perceive speech differently based on categorical boundaries.

 

Contextual Influences on Speech Perception

 

1.    Visual Context:

o   McGurk Effect (McGurk & MacDonald, 1976): Visual input alters auditory perception (e.g., seeing lips form /ga/ while hearing /ba/ results in perceiving /da/).

2.    Lexical Context:

o   Ganong Effect (Ganong, 1980): Phoneme perception is biased by surrounding lexical information.

§  Example: Ambiguous sounds on a "gift" to "kift" continuum are interpreted based on familiar words.

 

Key Points

 

1.    Speech Production:

o   The source-filter theory explains how vocal cords and the vocal tract shape speech sounds.

2.    Categorical Perception:

o   Speech is perceived categorically despite graded acoustic inputs.

3.    Contextual Influences:

o   Speech perception is shaped by visual and lexical cues, demonstrating the brain's reliance on multi-sensory and contextual information.

Speech Production

 

  • Source-Filter Theory

    • Source: Vocal cords produce vibrations (important for pitch/intonation).

    • Filter: Supralaryngeal vocal tract shapes sounds into distinguishable phonemes (involves lips, tongue, teeth, oral/nasal cavities).

    • Evidence from MRI scans highlights these vocal movements.

  • Spectrogram Analysis:

    • Visualizes speech amplitude over time and frequency.

    • Formants: Energy bands shaped by vocal tract; critical for intelligibility (e.g., F1, F2, F3).

  • Vowels:

    • High-to-low vowels (e.g., “heed” → “hod”): F1 increases.

    • Front-to-back vowels (e.g., “heed” → “had”): F2 decreases.

  • Consonants:

    • Recognition involves F2 and F3 (e.g., “pem,” “ten,” “keng”).

 

Speech Perception

 

  • Phonemes:

    • Smallest meaningful units of sound; distinguished by changes in sound pressure from vocal movements.

    • Examples: /p/ in "pin" vs. /b/ in "bin".

  • Categorical Perception:

    • Sensory changes perceived discretely rather than continuously.

    • Key features:

      1. Abrupt identification change at phoneme boundaries.

      2. Discrimination peaks at boundaries.

      3. Discrimination aligns with identification (e.g., only "different" if phonemes differ).

  • Contextual Effects:

    • Auditory context: Stimuli like "Yanny vs. Laurel."

    • Visual context: The McGurk effect – mismatched audio-visual cues altering perception (e.g., hearing “ba” while seeing “ga”).

    • Lexical context: The Ganong effect – lexical knowledge biases phoneme interpretation (e.g., "gift" vs. "kift").

 

Key Takeaways

 

  1. Speech production involves a distinct source and filter system, each contributing unique perceptual elements.

  2. Speech perception is inherently categorical despite continuous acoustic signals.

  3. Context (visual, lexical, auditory) plays a significant role in interpreting speech sounds.

 

 

 

Motor Theory of Speech Perception

 

  • Proposed by: Alvin Liberman.

  • Core Idea:

    • A specialized speech module processes speech sounds separately from non-speech sounds.

    • Speech perception relies on identifying intended vocal gestures rather than acoustic signals.

    • Example: The phoneme /p/ in "pin" and "spin" differs acoustically but involves the same lip gesture.

  • Evidence Supporting the Theory:

    • fMRI studies: Listening to meaningless syllables activates motor and premotor areas (Wilson et al., 2004).

    • Transcranial Magnetic Stimulation (TMS): Disruption of premotor areas impairs phoneme discrimination in noise (Meister et al., 2007).

  • Criticism:

    • Non-speech sounds (e.g., musical intervals) also show categorical perception (Burns & Ward, 1978).

    • Animals like chinchillas exhibit similar phoneme boundaries, suggesting speech perception isn’t uniquely human (Kuhl & Miller, 1978).

 

Neural Basis of Speech Perception

 

  • Classic Model:

    • 19th-century neurologists (Broca, Wernicke, Lichtheim) linked specific brain areas to speech functions:

      • Wernicke’s area: Speech perception.

      • Broca’s area: Speech production.

    • Left-hemisphere dominance.

  • Modern Dual Streams Model (Hickok & Poeppel, 2007):

    • Ventral Stream:

      • Processes word recognition ("What does this word mean?").

      • Bilateral involvement (both hemispheres).

      • Damage in anterior/inferior temporal regions impairs semantic understanding.

    • Dorsal Stream:

      • Links speech perception with production.

      • Key for phoneme discrimination ("Is it /ba/ or /da/?").

      • Critical in learning new languages and continued speech processing in adulthood.

      • Primarily left-hemisphere dominant.

 

Word Recognition

 

Cohort Model (Marslen-Wilson & Tyler, 1981)

  • Key Idea:

    • Words are recognized progressively as they unfold over time.

    • Initial speech sounds activate a cohort of potential words.

    • Recognition occurs at the uniqueness point (UP) when only one word matches the input.

  • Features:

    • Words are activated immediately upon minimal input.

    • Competing words create "lexical competition."

  • Evidence:

    • Shadowing task: Average response latency (~250 ms) suggests word recognition before hearing full word.

  • Limitations:

    • Verbal nature of the model makes quantitative evaluation challenging.

    • Difficult to simulate computationally.

 

TRACE Model (McClelland & Elman, 1986)

 

  • Core Concept:

    • Speech perception involves a connectionist framework with interactive levels:

      • Phonemessyllables, and words.

    • Feedback loops allow higher-level word recognition to influence lower-level phoneme perception.

  • Strengths:

    • Explains context effects like the Ganong effect (lexical knowledge biases phoneme perception).

    • Supported by eye-tracking studies showing real-time processing adjustments.

  • Criticism:

    • Top-down feedback explanations have been contested (e.g., Norris et al., 2000).

 

Key Takeaways

 

  1. The motor theory's idea of involving motor representations in speech perception is partially supported but not fully validated.

  2. Dual streams in the brain specialize in word recognition (ventral stream) and linking perception with production (dorsal stream).

  3. Word recognition involves competition and activation processes, as explained by Cohort and TRACE models.

  4. Context effects in speech perception remain a hotly debated topic, particularly in TRACE's framework.

 

robot