Vowels and Formants

Vowels and Formants

Source Filter Model

  • Speech signal is a combination of vocal tract properties functioning as a frequency selective filter.
  • Variations in air pressure correspond to this frequency selective filter.
  • Certain frequencies get through the filter, while others are suppressed.
  • This filtering effect varies continuously during speech, making it complex.

Recap of Source Filter Theory

  • Airstream (typically pulmonic) passes through the vibrating vocal folds.
  • This results in a complex wave that can be broken down into individual frequencies.
  • The spectrum displays frequency (x-axis) and amplitude/loudness/intensity (y-axis).
  • The vibrating glottis generates noise (the source).
  • The vocal tract filters this noise, allowing some harmonics to pass while suppressing others (the filter).
  • The resulting output spectrum shows peaks corresponding to formants, which are peaks of resonance.
  • Vocal tract resonates at particular frequencies, which change with the configuration of vocal tract organs.
  • Spacing of harmonics relates to fundamental frequency.
  • The lowest part of the signal corresponds to the fundamental frequency.

Difference between Harmonics and Formants

  • Harmonics: Generated by the vibrating glottis (noise source).
  • Formants: Bundles of frequencies corresponding to the resonating points of the vocal tract.
  • Harmonics depend on the fundamental frequency.
  • Formants depend on the resonating properties of the vocal tract.

Source Filter Illustration

  • Laryngeal source (frequency domain: frequencies on x-axis, amplitude on y-axis).

  • Laryngeal source (time domain: airflow/air pressure fluctuations from vibrating glottis).

  • Filter response has spacing of resonating peaks.

  • Harmonics coinciding with these peaks are emphasized and become formants.

  • Different vocal tract postures or settings for different vowels produce varying filters.

Formant Synthesis

  • Artificial speech created by combining the source filter.

  • Variations illustrate the changing pitch of the voice, resulting from changing vibrations of the vocal folds.

  • Generated noise component with variations illustrating pitch changes.

  • Filter response.

Key Technology in Acoustics

  • Analysis of speech waves or waveforms.
  • Analysis of physical properties of speech.
  • Requires a recorded signal.
  • All sounds result from vibration (resonation) with a source of energy.

Vowel Space

  • Interested in locating vowels in acoustic space relative to one another.
  • Hertz (Hz) is the basic parameter for measuring frequencies.
  • Vowel space is continuous; vowels can vary continuously.
  • No fixed f<em>1f<em>1 and f</em>2f</em>2 values due to variability.
  • Variability depends on individual vocal tract size, context, language, stress, preceding/following consonants.

PRAAT Software

  • Software used for acoustic phonetic analysis.
  • Need to load pre-recorded sound files.
  • Can then look at waveforms and spectrograms.

Formants

  • Resonating frequencies of air in the vocal tract.
  • Peaks of resonance (bundles of frequencies).
  • Location depends on vocal tract configuration (tongue body position, lip rounding, etc.).
  • Vowel qualities are associated with articulatory properties.
  • Vowel sound contains multiple pitches or frequencies simultaneously.
  • Quality depends on overtone structure.
  • Formant differences occur in f<em>1f<em>1 and f</em>2f</em>2.

Tube Models of Vocal Tract

  • Vocal tract can be modeled as a series of tubes open at one end (mouth).
  • A: Front cavity, C: Rear cavity, B: Area of maximum constriction.
  • Vocal fold vibration sets air in motion.
  • Formants for different vowels result from different vocal tract shapes and constrictions.
  • Example: /i/ (ee) has a shorter front cavity than /ɑ/ (ah).
  • Formant 1 (f1f_1) is related to the lower portion of the pharyngeal cavity (C).
  • Formant 2 (f2f_2) is related to the length of the front cavity (A).
  • Short front cavity leads to a higher f2f_2.

Perception of Formants

  • Can feel formants by tapping the throat while voicelessly making different vowels.
  • Resonance is low for closed vowels and higher for open vowels.
  • Whispering vowels can also reveal major resonating frequency of the vocal tract.
  • High resonating frequency for close front vowels; lower for open/back vowels.
  • f3f_3 is important for languages with many front/close vowels, helping to distinguish rounding.

Formant Numbering and Classification

  • Resonances/overtones/formants are numbered from low to high: f<em>1f<em>1, f</em>2f</em>2, f3f_3.
  • Do not confuse with harmonics.
  • Vowels are classified by their first two formants (f<em>1f<em>1 and f</em>2f</em>2).

Spectrograms

  • Spectrogram displays a series of spectra lined up in time.
  • Formant peaks appear as dark bands of energy (collections of frequencies).
  • Darkness indicates amplitude.
  • Vowels stand out from surrounding signals.
  • Narrowing of the signal in a waveform indicates a consonantal articulation (greater constriction).
  • Clear bands of energy in a spectrogram indicate vowel articulation.

Measuring Formants on a Spectrogram

  • The textbook labels here indicate the main acoustic properties of these valves, including f<em>1f<em>1, f</em>2f</em>2, and f3f_3.
  • A blue line indicates an approximate measurement point, typically around the 50% midpoint, to minimize coarticulation effects with neighboring speech sounds.

Vowel Plotting

  • Numbers on the y (vertical) and x (horizontal) dimensions are inverted.
  • /i/ (ee) has a low f1f_1, indicating a closed vowel.
  • /ɑ/ (ah) has a high f1f_1, indicating an open vowel.
  • High f<em>1f<em>1 means a more open vowel; low f</em>1f</em>1 means a closer vowel.

Factors Affecting Formant Values

  • Speaker physiology: Larger vocal tracts produce lower frequencies; smaller vocal tracts produce higher frequencies.
  • Language/dialect (e.g., American English vs. Australian English vowels).
  • Number of contrasting vowels in a language.
  • Stress: Unstressed vowels show undershoot (formant undershoot), centralizing towards schwa.
  • Casual speech: Vowel spaces shrink.
  • Consonant environment: Vowel-consonant and consonant-vowel coarticulation.

Source vs. Filter Effects

  • Source (vocal folds): Thicker vocal folds produce lower pitch (lower harmonics), thinner vocal folds produce higher pitch (higher harmonics).
  • Filter (vocal tract length): Longer vocal tracts have different peaks of resonance, resulting in lower formants.
  • Children have shorter vocal tract lengths and thinner vocal folds, resulting in higher pitch and higher frequency values for f<em>1f<em>1 and f</em>2f</em>2.

Example: Gunwingu Language

  • Gunwingu is an indigenous language with a five-vowel system.
  • Shows a relatively compressed vowel space.

Vowel Space across Speakers

  • The children, in the circle, present a graph or a visual presentation of the vowel space for the children.
  • In the biological males an oval indicating the same variable. This can be compared with a male.
  • Higher frequencies in children's vowels with a compressed vowel space.

German Lax Vowels

  • Compare biological males and biological females for German's lax valves.
  • The dimensions of the female vowel space differ quite a bit from the male vowel space. Higher frequencies in general and particularly in that formant dimension we see quite a lot of variation there.

Australian English Vowels

  • Example utterance: "Have these good soft shoes."
  • The red lines are automatically derived format tracks.
  • PRACT does a lot of work for you essentially. It identifies hopefully where you've got these valves.
  • Shows that the formant is relatively high compared to the next vowel e. Why is that? Because it's lower.

Measurements for Formants.

  • Measurements average and typical F1 is 860 kHz and F2. These are the values that will be shown on the screen.

Connected speech valve spaces

  • The male speaker actually exaggerated these vowels but the results were quite nice.