1/82
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
measuring vowel formant frequencies
1.Spectrogram shows measurement of formant frequencies at temporal middle of vowel
2.LPC spectra show formant peaks in spectrum, computed with a 20 ms window centered around the temporal midpoint (shown in the spectrograms)
3.Note correspondence of formant frequencies in the two display types
F1-F2 plot
•Each phonetic symbol within an ellipse is an F1/F2 coordinate for an individual talker, produced in an /hVd/ frame where V = vowel
•Variability of vowel formant frequencies, for given vowel, is explained by age, sex, and other factors within any category (such as age)
•Within any ellipse, points in the lower left are likely from men, points in the middle are likely from women, and points in the upper right are likely from children
corner vowel formant frequencies
for men, women, and children
•Effect of sex and age-related differences in vocal tract length on formant frequencies
•Influence of vocal tract length on formant frequencies is not the same for each of the corner vowels (compare point for /u/ to points fo /i/)
lax vowel formant frequncies
for men, women, and children
•Effect of vocal tract length on lax vowel formant frequencies is generally the same as the effect on corner vowel formant frequencies
vowel reduction
•Note axis reversal in the two graphs; left-hand graph shows F1/F2 relations in “articulatory format” by placing F2 on x-axis, F1 on y-axis
•Vowel reduction defined as the movement of F1-F2 coordinates in the direction of the “neutral” vowel (F1 ~ 500 Hz, F2 ~ 1500 Hz for adult males, F1 ~ 600 Hz, F2 =1800 for adult females). Neutral vowel is the vocal tract configuration with no constrictions (like /ə/)
•Relative to “null context” (vowels in /hVd/ context), speaking rate, syllable stress, type of speech material (citation form (null context) versus connected speech as in reading), and speech style (clear versus casual) may result in vowel reduction (separately and in combination; dialect and language may also affect patterns of vowel reduction
•A shared factor among all these potential influences on vowel reduction is variation in speaking rate; shorter vowels (faster rate) are often accompanied by vowel reduction
•Study of vowels and their variability is important because 1) vowels contribute in a significant way to speech intelligibility, and 2) delayed vowel development is thought to be one diagnostic marker of developmental apraxia of speech
intrinsic vowel duration
•is a property of the vowel, as shown in either the case of a following voiceless obstruent (lower curve) or voiced obstruent (upper curve); not the variation in vowel duration across vowel even though the CVC frame is constant. Low vowels as generally longer than high vowels, tense vowels are longer than their lax counterparts.
extrinsic influences on vowel duration
are variables that affect a vowel duration in a constant context (in this case, the CVC frame shown in the figure). Variables include speaking rate, syllable stress, phonetic context, speaking style, position of the vowel in a multisyllabic word or in an utterance are all extrinsic influences on vowel duration.
dipthongs
•Well-defined formant structure
•Defined by prominent formant transitions, especially in F2 (see box superimposed on F2 transition of /ɑɪ/)
•Distinguishing features of diphthongs are the extensive and rapid. Extensive means covering a large range of frequencies, rapid means the transition has a steep slope
•Separate phonemic category compared with vowels: not “two vowels connected by movement”
dipthongs in F1-F2 space
•Diphthongs typically do not begin or end at the F1-F2 coordinate for the vowels (example, /ɔɪ/, compare red-circled onsets and offset to the black-circled vowels; red arrows show distance between diphthong onset and /ɔ/, and diphthong offset and /ɪ/)
nasal murmurs
•Murmurs at all places of articulation have low-frequency F1n originating in the pharyngo-nasal cavity, and higher formants from the same cavity
•Murmurs have an antiresonance originating in the closed oral cavity, and antiresonances originating in the closed sinus cavities connected to the nasal passageways
•Effect of antiresonances is to reduce energy at and around the frequency of the antiresonance
•Nasal murmurs are much less intense than surrounding vowels (as shown in spectrographic comparisons of /i/ vs. /m/ and /ɑ/ vs. /m/
semivowels
•like vowels, nasals, and diphthongs, have a well-defined formant structure
•_ have a brief constriction interval (marked by the black bars) in which formant frequencies are relatively stable
•_ have extensive F2 and F3 transitions into and out of the constriction interval
•The acoustics of semivowels have complex, underlying articulatory causes, which may explain (in part) why they are mastered relatively late in children’s sound development
fricatives
•Spectrogram shows difference between aperiodicity of fricative and periodicity of surrounding vowels
•LPC spectra show differences between sibilants and non-sibilants (compare dark to light spectra in each pair of spectra)
•LPC spectra show frequency difference in energy concentration for alveolar (/s/) vs. palato-alveolar /ʃ/ fricatives (higher frequency concentration for /s/, lower frequency concentration for /ʃ/). These differences reflect the size of the front cavity in the two fricatives
/h/ acoustics
•/h/ is a segment usually showing both aperiodic (resulting from turbulent flow at narrowed glottis and/or at edges of ventricular folds or epiglottis) and periodic energy (resulting from weak vibration of the vocal folds)
•Energy in the /h/ interval is usually concentrated at formant locations of surrounding vowels (see circles where /h/ formants are continuous with vowel formants
acoustics of stop consonants
•Closure intervals marked by horizontal lines; voiceless closures typically have no energy, voiced closures have voicing energy at the bottom of the spectrogram
•
•Location of bursts shown by vertical lines
•
•Voiceless stops have relatively long friction and aspiration intervals (~ 40-70 ms), voiced stops short frication intervals (< 20 ms).
•Stops preceding stressed vowels, compared with following stressed vowels, have longer closure intervals, more intense burst and frication intervals
••Voiced stop burst and frication intervals are less intense than voiceless stop burst and frication intervals
summary of voice onset time (VOT) data
•In English, VOT boundary for voiced vs. voiceless stops is 20-25 ms (vertical dashed line). A notable exception is when voiceless stops are the second segment of an s+stop cluster (ˈsCV)
•
•VOT for voiced stops in the utterance-initial position can have negative values, meaning that glottal pulsing begins before the burst (during the closure interval)
•Variables such as position in stress (pre- vs post-stressed), rate, phonetic context, and speaking style primarily affect voiceless VOTs
•VOT varies somewhat by place of articulation, with VOT increasing in the order bilabial, lingua-alveolar, dorsal (also called velar)
burst spectra for stop consonant place of articulation
•FFT spectra computed from 20 ms “window” extending from the burst
•/p/ burst spectrum: “diffuse falling”
•/t/ burst spectrum: “diffuse rising”
•/k/ burst spectrum: “compact”
•Burst spectra for voiced stops are essentially the same as voiceless stops, but with lesser intensity and more energy in the low frequencies due to voicing
blumstein and steven stop-burst spectral templates
•Templates allow for some variability in burst spectrum, but the gross spectral shape is retained; an essential cause of this variability is coarticulation (see inset, both spectra are diffuse rising even with local differences due to different vowels)
•Templates were successful, but not perfectly so, in matching each of the three places of stop articulation in American English
•The success of template matches to each of the three stop places suggests there is sufficient acoustic stability for reliable human identification of place from the speech acoustic signal
rise time in acoustic terms
1. Cues in the envelope, how long does envelope go from 0 to full volume
rise time and distinguishing between at least 2 manners of articulation
See how 1 sound goes higher/lower and same with other manner,
modulation depth and acoustic representation
? 0 to full volume, greater depth greater amplitude, stronger constriction when depth is greater, power
greater modulation depth and vocal tract constritction
Vocal tract is completely constricted and together
stop consonant and acoustic feature
? silent stop gap
if silent stop gap is removed
Wouldn’t see rise, flaccid, not enough power to build up pressure from the constriction,
rapid vertical line in waveform
Burst release of consonant sound
burst release and physiological occurence
Pressure build up and then release
2 acoustic landmarks and voice onset time
Release of stop consonant, onset of voice
VOT important for distinguishing consonants
Measure that precise timing in waveform, tell differences between consonants and waveforms
amplitude envelope difference in stop and fricative
? Different amounts of airflow used, different ways the airflow is shaped,
fricative- long smooth amplitude rise time,
stop- silent period and then immediate vertical line
airflow in fricative and stop
? Type of constriction that airflow has
2 acoustic features make affricate a combination sound
Rapid onset and then long period of noise
duration help distinguish affricate from fricative
shorter in duration, more rapid onset time
nasal sound key characteristics
Sudden decrease in volume, lack of upper formant energy in spectrogram, low-frequency energy, presence of antiresonance
frequency energy and nasals
high-frequency energy is reduced because some energy is trapped in the mouth
acoustic cue for place of articulation identification
Frequency cues such as formant transitions, different pattern in frequency, change different shapes of vocal tract, see different patterns in frequencies
waveform and place of articulation
See change in resonant frequency , hard to tell place of articulation, understand Is changes to resonant frequencies
/s/ and /ʃ/ differ acoustically in spectrogram
has energy at lower frequency,
has energy at higher frequency, different spectral peaks
/s/ /ʃ/ and place of articulation
Different points of constriction of the place in your mouth
2 acoustic cues in a voiced consonant
voicing bar, periodic wave form – doesn’t have noisy representation
acoustic cues and voiceless sounds
Doesn’t have noise representation, periodic wave form, no voicing bar
key acoustic signature of /r/
third formant, variation depending on vowel that follows behind it
use a spectrogram to identify /r/
Because formants are only shown on a spectrogram
multiple acoustic cues to identify consonatns
no single acoustic figure is fully accurate
consonants are unique acoustically
involve obstruction or constriction of airflow
less energy, more complex patterns than vowels
acoustic signal reflects
noise, silenc, transitions
3 key dimensions
manner of articulation
place of articulation
voicing
stops acoustic feature
silence (stop gap) + burst
fricatives acoustic features
aperiodic noise- a sound or signal that does not repeat its wave pattern at regular intervals, lacking a consistent, predictable, or periodic structure. It is characterized by random vibrations, a broad range of frequencies, and irregular changes in intensity over time.
affricates acoustic features
stop + fricative combo
VOT (voice onset time)
time between release of stop and voicing onset
distinguishes /p/ vs. /b/, /t/ vs. /d/
clinical note: critical for intelligibility- understanding
stop gap and burst
stop gap- silence before release
burst- brief noise at release
fricatives and noise
high-frequency energy (eg /s/)
lower-frequency noise (eg /f/)
formant transittions
resonant frequencies of vocal tract (F1, F2, F3)
rapid changes into vowels
provide place of articulation info
GPS directions= tell your brain where sound is coming from the mouth
risk time and modulation depth
rise time- speed of amplitude increase
modulation depth- variation in amplitude
important for perception and clarity
silent→ burst→ delayed voicing (what type of sound)
voiceless stop (/p,/t/,/k/)
continuous high-frequency noise (what type of sound)
fricative (/s/)
low-frequency energy + nasal resonance (what type of sound)
nasal (/m/, /n/)
key cues
VOT
stop cap
noise
transitions
these cues→ speech perception + clinical diagnosis
frequency
the measurement of how rapidly sound waves oscillate—or vibrate—between high and low pressure
Hz (hertz)
which represents cycles per second.
formants
specific frequency bands that are amplified by the resonance of the vocal tract (throat, mouth, and nasal cavities) during speech or singin
promine
harmonics
a series of horizontal lines or bands that represent the integer multiples of a sound's fundamental frequency
They appear as horizontal, often parallel, lines. If the pitch (fundamental frequency) is constant, the lines are straight; if the pitch changes (e.g., singing), the lines follow that movement
periodicity vs noise
Periodicity- signals with predictable, repeating patterns (harmonics) over time or space, such as a musical note or a clock
Noise- unpredictable, random, and lacks a consistent structure, often obscuring the signal.
generation of acoustic signal
when a vibrating object creates pressure variations (compressions and rarefactions) in a medium—such as air, water, or solids—propagating as sound waves.
time (x- axis) (spectrogram)
Moves from left to right, showing how the signal changes over time
duration of signal
frequency (y-axis) (spectrogram)
Represents the pitch or rate of vibration, with lower frequencies at the bottom and higher frequencies at the top.
amplitude/intensity (spectrogram)
Indicates the loudness or energy of a particular frequency.
Darker or "warmer" colors (red/yellow) represent higher energy
, while lighter or "cooler" colors (blue/green) represent lower energy.
wideband spectrogram
A type of spectrogram that provides good time resolution (sharp vertical lines for timing), ideal for seeing formants and speech timing.
narrowband spectrogram
A type of spectrogram that provides high frequency resolution (sharp horizontal lines), ideal for seeing individual harmonics.
transitions (spectrogram)
Rapid changes in formant frequencies, usually indicating movement from a consonant to a vowel.
silence (spectrogram)
Indicated by a white or blank space on the spectrogram, often seen during the gap of a plosive consonant.
fricative noise (spectrogram)
Appears as random, chaotic static, representing chaotic airflow (e.g., s, f).
voicing (spectrogram)
Dark, often vertical striations at the bottom of the spectrogram, indicating vocal fold vibration.
sound source theory
consists of two main components:
Source: The sound is generated by the vibration of the vocal folds (vocal cords) when air from the lungs is expelled. This vibration creates a series of pulses that produce sound waves, with the rate of these pulses determining the pitch of the sound.
2
Filter: The vocal tract acts as a filter, altering the sound produced by the vocal folds. The shape of the vocal tract changes during speech, affecting the sound's characteristics and creating different vowel sounds.
2
This theory is fundamental in understanding voice production and is widely used in speech synthesis and analysis.
Glottal Flow Function
Describes the variations in airflow through the glottis during phonation.
Inverse Filtering
A technique used to recover the glottal source signal from the speech signal.
Vocal Tract Filter
The resonating structure that shapes the speech sounds produced by the glottal source.
Formant Frequencies
The resonant frequencies of the vocal tract that define vowel sounds.
Resonant Frequencies
Frequencies at which a tube or cavity naturally vibrates.
Articulatory Configurations
The positions and movements of the articulators (like the tongue and lips) that shape speech sounds.
Pressure Distribution
The variation of air pressure within the vocal tract during sound production.
Flow Distribution
The variation of airflow within the vocal tract during sound production.
Acoustic Theory of Vowel Production
A framework for understanding how vowel sounds are produced acoustically in the human vocal tract.
Vowel Spectra
The unique frequency patterns produced by different vowel sounds as represented in a spectrum.