LC

In-Depth Notes on Sound Spectrography and Speech Processing

  • Sound Spectrograph

    • A tool to represent sound waves graphically, similar to a knee.

    • Reveals common characteristics across different speakers.

    • The concept of invariant characteristics allows for effective communication despite variations in vocal timbre.

  • Invariant Characteristics

    • These features exist across various subjects when pronouncing the same words (e.g., "cat").

    • They ensure understanding despite differences in individual vocal sounds.

  • Formants and Formant Transitions

    • Formants are segments within the sound spectrograph closely associated with vowels.

    • Example: The phrase “Tiny Tim tiptoes with the tulips” illustrates vowel sounds characterized by formants.

    • Formant Counts:

    • The vowel sound 'a' has three formants.

    • The vowel sound 'u' has two formants.

    • The distinction in consonants often depends on the transition between formants rather than their individual characteristics.

  • Consonant Characteristics

    • Consonants can be characterized by their formant transitions, seen in the differences of sounds like "sh" and their accompanying spikes in the spectrograph.

    • The lowest formant frequency is the first one identified for consonants.

  • Speech Processing in the Brain

    • The brain processes speech signals differently compared to other sounds, supported by infant babbling viewed across many languages.

    • The presence of invariant characteristics (formants and transitions) suggests specialized processing in the brain for language comprehension and production.

  • Categorical Perception

    • Introduces the concept of Voicing Onset Time (VOT), which is the delay between the initiation of a sound and the onset of vocal cord vibrations.

    • Example Measurements:

    • For "ta," VOT is about 91 milliseconds.

    • For "da," VOT is around 17 milliseconds.

    • Trials involve presenting subjects with varying VOT in random order to identify perceived sounds.

    • A critical boundary exists at approximately 40 milliseconds, influencing perception categorically (e.g., distinguishing between "ta" and "da").

  • Phoneme Processing

    • Phonemes are perceived categorically, leading to distinct interpretation ranges, in contrast to other types of sounds that do not follow this pattern.

    • This suggests unique cognitive processing characteristics associated with speech sounds.

  • Neurophysiology and Language

    • Studies show that specialized areas of the brain (e.g., Broca's and Wernicke's areas) relate to language processing — Broca's area affects speech production while Wernicke's area impacts comprehension.

    • Despite different areas of investigation (infant babbling, categorical perception, invariant characteristics), all support the conclusion that there are specialized mechanisms for processing speech in the brain.