chung final studying

Part I: Acoustic Theory of Vowel Production (Chapter 8)

The foundation of speech acoustics is the Source-Filter Theory, which posits that the vocal folds provide the raw sound and the vocal tract shapes it.

A. The Source (Glottal Signal)

  • Glottal Flow (Vg): The actual acoustic signal created by the vocal folds.

  • The Harmonic Spectrum: A frequency-domain representation where F0 is the first harmonic (H1).

  • Spectral Tilt: Amplitude drops by roughly 12 dB per octave.

    • Strained Voice: Shallower tilt (more high-frequency energy).

    • Breathy Voice: Steeper tilt (less high-frequency energy).

B. The Filter (Vocal Tract Resonances)

  • The Tube Model: The vocal tract acts like a tube closed at the glottis and open at the lips.

  • Formants (F1,F2,F3): Resonance peaks where energy transfer is most efficient.

  • Perturbation Theory:

    • Pmax​ (Pressure Max): Constricting here raises the resonant frequency.

    • Vmax​ (Velocity Max): Constricting here lowers the resonant frequency.

    • Lip Rounding: Lowers all formants because the lips are a Vmax​ for all resonances.

Part II: The Acoustics of Consonants (Chapter 9)

Consonants differ from vowels by using constricted or completely blocked airflow, often creating aperiodic noise sources.

A. Stop Consonants (Plosives)

  • Phase 1: Closure (Gap): A period of near-total silence on a spectrogram.

  • Phase 2: Release Burst: A brief (10–30ms) transient noise caused by the sudden pressure drop.

  • Phase 3: Frication/Aspiration: Turbulent noise as the constriction widens.

B. Fricatives

  • Aperiodic Source: Created by forcing air through a narrow channel (e.g., /s, f, z/).

  • Sibilants vs. Non-sibilants: Sibilants (/s, ʃ/) have higher intensity and better-defined spectral peaks compared to non-sibilants (/f, θ/).

C. Nasals & Laterals (Antiformants)

  • Coupled Resonators: Sound travels through the pharyngeal and nasal cavities simultaneously.

  • Antiformants (Zeros): Dips in the spectrum where energy is "trapped" and cancelled out in side branches.

  • Nasal Murmur: A dominant low-frequency resonance (~250–300 Hz).

Part III: Dynamic Cues & Transitions (Chapter 10)

Speech is rarely steady-state. We perceive phonemes based on how the acoustic signal changes over time.

A. Formant Transitions

  • Definition: The movement of formants as the articulators travel from a consonant to a vowel.

  • F1 Transition: Always increases when moving from a stop/fricative into a vowel (opening the mouth).

  • F2 Transition: The primary cue for place of articulation.

    • Bilabial (/b/): Usually shows a rising F2 into the vowel.

    • Alveolar (/d/): Points toward a "locus" of ~1800 Hz.

    • Velar (/g/): Often shows a "Velar Pinch" where F2 and F3 come close together.

B. Voice Onset Time (VOT)

  • The time between the release of a stop and the onset of voicing.

  • Short VOT: Voiced stops (/b, d, g/).

  • Long VOT: Voiceless stops (/p, t, k/).

Part IV: Advanced Acoustic Effects (Chapter 11)

This chapter accounts for the non-ideal physical properties of the human body that affect the sound.

A. Radiation at the Lips

  • The mouth acts as a small opening in a large sphere (the head).

  • High-Pass Effect: Radiation boosts high frequencies by +6 dB per octave.

  • Net Result: When the -12 dB glottal tilt is combined with the +6 dB radiation boost, the speech leaving the lips has a net tilt of -6 dB per octave.

B. Energy Loss Mechanisms (Damping)

  1. Friction: Air molecules rubbing against the walls of the vocal tract.

  2. Heat Exchange: Energy lost to the warm, moist tissues of the tract.

  3. Wall Vibration: The cheeks and throat are not rigid; they vibrate, absorbing low-frequency energy.

  • Acoustic Result: These losses increase the bandwidth of formants, making the peaks wider and less "sharp."

C. Subglottal Coupling

  • When the glottis is partially open, the vocal tract connects to the trachea and lungs.

  • This introduces subglottal resonances, which can cause "holes" or extra peaks in the vowel spectrum, often seen in breathy speech or during specific consonants.