Vowels and Formants

The speech signal is the result of the vocal tract functioning as a frequency-selective filter (Textbook p. 197-8).
Both source and filter vary continuously during speech.

Source: Noise generated by vibrating glottis, which is a complex waveform broken down into frequency x amplitude components (harmonics).
Filter: Vocal tract filters this noise, promoting some frequencies and suppressing others.
Output: Acoustic output with peaks corresponding to formants (peaks of resonance).
Spectrum = Frequency x Amplitude display.
Spacing = $f0$
Speech synthesis involves creating an artificial speech waveform by combining the source and the filter.
Frequencies are generated by the source, and the filter response is generated by the vocal tract.
Amplitude = darkness in patterns. Change in $f0$ .

Acoustics is the analysis of sound waves and the physical properties of speech.
All sound results from vibration, which depends on a source of energy to generate it (CYF 2007, p.205).

Vowel space shows where vowels are located in acoustic space relative to one another, often labeled in Hz.
The space is continuous, and vowel production can vary greatly, meaning there are no fixed F1/F2 values.
Praat is software used for acoustic phonetics to load sound files (typically .wav) and view waveforms and spectrograms.

Formants are resonating frequencies of the air in the vocal tract; peaks of resonance (L&J Ch.8, p.197).
A vowel sound contains a number of different pitches simultaneously.
The quality of a vowel depends on its overtone structure, i.e., formants (usually F1 and F2).
Vowels are distinguished from one another by differences in overtones (formant differences in F1 and F2, also F3).

The vocal tract can be modeled as a series of tubes open at one end.
Vocal fold vibration sets air in vibration.
Different vowels have different shapes of the vocal tract and different places of constriction.
- e.g., at the hard palate [i], back/pharyngeal region [ɑ]
- A = Front cavity, C = Rear Cavity, B = area of maximum constriction

F1 frequency is closely related to the area of the lower portion of the pharyngeal cavity (C).
F1 is also related to Degree of mouth opening at lips (M).
F2 is closely related to the length of the front cavity (A).
The resonant frequency (formant) is low for close vowels and higher for open vowels.
The resonant frequency (Formant) is high for close front vowels and lower for open or back vowels
There isn’t an easy way to demonstrate F3 and higher formants

Resonances or Formants numbered from low to high frequency, e.g., F1 F2 F3.
Vowels are primarily classified in terms of the first two formants, e.g., F1 and F2, although F3 can be used to determine rounding.
Changes in the relative formant values give vowels their quality.

F1 equates to opening:
- Close vowel = low F1
- Open vowel = high F1 (inverse relationship to aperture/height)
- F1 of /i/ is lower than F1 of /æ/
F2 equates to backing:
- Backer vowels, lower F2
- F2 of AusEng /ʉ/ would be higher than F2 of Spanish /u/
Rounding lowers both F1 and F2, so [y] has a lower F1 and F2 than /i/
F1 value INCREASES as vowel aperture opens from close to open & pharyngeal cavity decreases in volume
F2 LOWERS as you move from front to back – lengthening the front cavity by backing the tongue.
ROUNDING lowers F1 & F2 - elongates oral cavity (works well with backing!)
F3 thought to be important for distinguishing front unrounded [i] from [y]

Speaker physiology (larger vocal tract formants typically lower).
Language or Dialect (e.g., American English versus Standard Australian English).
Number of contrastive vowels (Aus Indigenous languages vs. AusE).
Stress & Accent:
- Unstressed vowels might show formant “undershoot.”
- Vowels do not achieve their F1/F2 targets.
- Reduced vowel quality (vowels that are unstressed tend to head towards schwa [ə]).
Casual speech or rapidly spoken vowels also show undershoot.
Consonant environment – coarticulation.
CLEAR SPEECH – more peripheral vowels.

The length and thickness of the vocal folds affects the Fundamental Frequency.
- Thicker, bigger vocal folds = lower $F0$ or pitch.
The length of the vocal tract changes the resonant frequency of a voice.
- Longer vocal tracts = lower formants in general.
Children have shorter, smaller vocal tract lengths, on average, than adults.
Relative F1 F2 is the key.

Same shape, different values.
German Lax Vowels transcribed in MRPA (machine readable phonetic alphabet – IPA symbols shown in red - (from Harrington 2010)

Video Praat Demonstration to show you how to plot practice.
Download sound files and save them in an easily accessible location.
Use a printed vowel space or annotate a PDF on screen (or draw in your textbook).