chung final studying
Part I: Acoustic Theory of Vowel Production (Chapter 8)
The foundation of speech acoustics is the Source-Filter Theory, which posits that the vocal folds provide the raw sound and the vocal tract shapes it.
A. The Source (Glottal Signal)
Glottal Flow (Vg): The actual acoustic signal created by the vocal folds.
The Harmonic Spectrum: A frequency-domain representation where F0 is the first harmonic (H1).
Spectral Tilt: Amplitude drops by roughly 12 dB per octave.
Strained Voice: Shallower tilt (more high-frequency energy).
Breathy Voice: Steeper tilt (less high-frequency energy).
B. The Filter (Vocal Tract Resonances)
The Tube Model: The vocal tract acts like a tube closed at the glottis and open at the lips.
Formants (F1,F2,F3): Resonance peaks where energy transfer is most efficient.
Perturbation Theory:
Pmax (Pressure Max): Constricting here raises the resonant frequency.
Vmax (Velocity Max): Constricting here lowers the resonant frequency.
Lip Rounding: Lowers all formants because the lips are a Vmax for all resonances.
Part II: The Acoustics of Consonants (Chapter 9)
Consonants differ from vowels by using constricted or completely blocked airflow, often creating aperiodic noise sources.
A. Stop Consonants (Plosives)
Phase 1: Closure (Gap): A period of near-total silence on a spectrogram.
Phase 2: Release Burst: A brief (10–30ms) transient noise caused by the sudden pressure drop.
Phase 3: Frication/Aspiration: Turbulent noise as the constriction widens.
B. Fricatives
Aperiodic Source: Created by forcing air through a narrow channel (e.g., /s, f, z/).
Sibilants vs. Non-sibilants: Sibilants (/s, ʃ/) have higher intensity and better-defined spectral peaks compared to non-sibilants (/f, θ/).
C. Nasals & Laterals (Antiformants)
Coupled Resonators: Sound travels through the pharyngeal and nasal cavities simultaneously.
Antiformants (Zeros): Dips in the spectrum where energy is "trapped" and cancelled out in side branches.
Nasal Murmur: A dominant low-frequency resonance (~250–300 Hz).
Part III: Dynamic Cues & Transitions (Chapter 10)
Speech is rarely steady-state. We perceive phonemes based on how the acoustic signal changes over time.
A. Formant Transitions
Definition: The movement of formants as the articulators travel from a consonant to a vowel.
F1 Transition: Always increases when moving from a stop/fricative into a vowel (opening the mouth).
F2 Transition: The primary cue for place of articulation.
Bilabial (/b/): Usually shows a rising F2 into the vowel.
Alveolar (/d/): Points toward a "locus" of ~1800 Hz.
Velar (/g/): Often shows a "Velar Pinch" where F2 and F3 come close together.
B. Voice Onset Time (VOT)
The time between the release of a stop and the onset of voicing.
Short VOT: Voiced stops (/b, d, g/).
Long VOT: Voiceless stops (/p, t, k/).
Part IV: Advanced Acoustic Effects (Chapter 11)
This chapter accounts for the non-ideal physical properties of the human body that affect the sound.
A. Radiation at the Lips
The mouth acts as a small opening in a large sphere (the head).
High-Pass Effect: Radiation boosts high frequencies by +6 dB per octave.
Net Result: When the -12 dB glottal tilt is combined with the +6 dB radiation boost, the speech leaving the lips has a net tilt of -6 dB per octave.
B. Energy Loss Mechanisms (Damping)
Friction: Air molecules rubbing against the walls of the vocal tract.
Heat Exchange: Energy lost to the warm, moist tissues of the tract.
Wall Vibration: The cheeks and throat are not rigid; they vibrate, absorbing low-frequency energy.
Acoustic Result: These losses increase the bandwidth of formants, making the peaks wider and less "sharp."
C. Subglottal Coupling
When the glottis is partially open, the vocal tract connects to the trachea and lungs.
This introduces subglottal resonances, which can cause "holes" or extra peaks in the vowel spectrum, often seen in breathy speech or during specific consonants.