1/60
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Sharp vs broad tuning
Sharp tuning
responds to small range of freqs
vibration persists for long time (light damping)
glass, tuning fork
Board tuning
responds to large range of freqs
vibration dies out quickly (heavy damping)
sound in air, phone earpiece
Landmarks of the vocal tract
Lungs: provide airflow for speech
Larynx (voice box): contains vocal folds → vibrate to produce voiced sounds, remain open for voiceless, epiglottis: flap that prevents food from entering the airway
Pharynx (throat): nasopharynx → connects to nose, oropharynx → connects to mouth, shapes sound resonants
Oral cavity: lips, tongue, teeth, alveolar ridge, hard/soft palate
Nasal cavity: adds resonance when velum lowered, blocked for non-nasal sounds
Source and filter
Production of vowel is the product of
the excitation (source) spectrum generated by the larynx
the frequency response (filter) of the vocal tract configuration.
Any change in vocal tract configuration alters the frequencies at which the cavities resonate
Size and length of the vocal tract also alter the frequencies at which the cavities resonate
Any vowel sound produced is a product of vocal fold vibration (the source) and the resonances of a particular vocal tract shape and length (the filter)
Formants
Concentration of acoustic energy around a particular frequency in the speech wave
F1: vowel height
F2: vowel advancement
F3: overall vocal tract length and lip rounding
Acoustic resonators relative to speech and hearing
Vocal tract
Both air filled and closed at one end
Vocal tract closed-end= vocal folds for voiced sounds
Ear canal
Ear canal closed-end=eardrum
The larger the resonating cavity (vocal tract), the lower the frequencies to which it will respond; the smaller the resonating cavity (vocal tract), the higher the frequencies to which it will respond
Factors related to resonance of air filled tubes
Air filled tubes resonate at certain frequencies depending on:
(1) whether it is open at one or both ends
(2) its length
(3) its shape
(4) the size of its openings
Length → Shorter tube = higher pitch (whistle). Longer tube = lower pitch (didgeridoo).
Shape → Narrow parts boost high notes; wide parts boost low notes.
Open/Closed Ends → Open at both ends (flute): Normal musical notes. Closed at one end (soda bottle): Deeper, hollow notes.
Air Speed → Warmer air = slightly higher pitch.
Softness Inside → Padded walls (like your throat) muffle the sound.
Solving for R1 of male vocal tract
Given: average vocal-tract length (L) of 17 cm
Wavelength = 4 (L)
Wavelength = 4(17) = 68 cm
Wavelength and frequency are related: f = c/wavelength
C = 34,400 cm/s
F = 34,400 / 68 = 506 Hz
R1 = 506 Hz for a 17-cm vocal tract
Relationship between pressure and velocity in vocal tract
Closed end (glottis)
Air pressure is at a maximum.
Air particle velocity must approach zero.
Open end (lips)
Air pressure is at a minimum.
Air particle velocity must be at maximum.
Bernoulli effect
[i] - “see”
High vowel: tongue body is elevated into the oral cavity, leaving pharynx open
Front vowel: high point of the tongue is anterior, behind the alveolar ridge
Genioglossus muscle is active to draw tongue up and forward
Cavity shapes: large pharynx, small oral cavity
F1 (back or pharyngeal cavity resonance) is low
F2 (front or oral cavity resonance) is high
[a] - “spa”
Low vowel: jaw & tongue are lowered
Back vowel: tongue is retracted into pharynx
Anterior belly of digastric muscle is active to lower jaw
Hyoglossus muscle is active to draw tongue down & back
Cavity shapes: small pharyngeal cavity, large oral cavity
F1 (back cavity resonance) is high
F2 (front cavity resonance) is relatively low
[u] - “you”
High vowel: tongue is raised out of pharynx
Back vowel: tongue dorsum is raised and retracted toward velum
Rounded vowel: lips are rounded and protruded
Styloglossus muscle is active to raise and back tongue
Orbicularis oris muscle is active to round lips
Cavity shapes: large pharynx, large oral cavity, overall vocal tract lengthened by lip protrusion
F1 is relatively low
F2 is relatively low
Tense vowels: e.g. [i e o u]:
Involve more extreme articulations
Have longer durations
Can occur in open syllables (e.g., CV)
May be diphthongized (e.g. [eI oU])
Lax vowels: [e.g., I ε ʌ ʊ]
Have less extreme articulatory postures
Are shorter in duration
Occur only in closed syllables (e.g., CVC)
Vowels across speakers
Relative patterns of formant values are consistent across speakers; for
example, [i] “eat” has a low F1, and a high F2
Absolute (actual) formant values vary across speakers
Speakers differ in overall vocal-tract length.
Parts of the vocal tract may differ in size: The pharynx is proportionally smaller in women than men.
Speakers of the same language vary in dialect and idiolect (unique to individual).
Vowels in clinical populations
Congenitally deaf speakers often have misarticulated vowel productions
Q: Why would this population have deviant articulation?
A: lack of auditory input from others and inability to self-monitor productions
Impaired vowel production may be evident in apraxia of speech, dysarthria, and cerebral palsy
Foreign dialects may involve errors in vowel production
Visual feedback (e.g., via spectrograms) may help speakers improve vowel production
Major differences between vowel and consonant production
Differences in the source and filter:
Constrictions used to produce consonants are usually more extreme than those for vowels
Various configurations of the vocal tract generate different combinations of resonant frequencies (formants) for each sound
Differences in the ways the sources of sound are used in the production of consonants
Vowels usually produced only with periodic sound source, consonants may use aperiodic source or a combination
Sound sources of consonants
Voiced consonants (includes all sonorants - nasals, liquids, glides): periodic laryngeal source
Voiceless consonants: supraglottal noise sources - aperiodic laryngeal source ([h] noise, aspiration)
Obstruents (stops, fricatives, affricates): supraglottal noise sources
Stop bursts: release built-up pressure; transient noise
Frication: air forced through a narrow channel becomes turbulent; sustained noise
Voiced obstruents combine periodic and aperiodic sources
Sonorants (nasals, liquids, glides) similar to vowels (consonant class)
Free airflow; articulation shapes vocal-tract cavities
Characterized mainly by formant frequencies
Have a periodic laryngeal source (all voiced)
Obstruents (stops, fricatives, affricates) (consonant class)
Blocked or restricted airflow
Have aperiodic sound sources in upper vocal tract
May be voiced or voiceless
Obstruents (stops, fricatives, affricates) (characterisitcs)
Stop bursts: release built-up pressure; transient noise
Frication: air forced through a narrow channel becomes turbulent; sustained noise
Voiced obstruents combine periodic and aperiodic source
Approximates (liquids, glides) (chracterisitcs)
Have limited articulatory constrictions that alter resonant frequencies (similar to vowels)
Classification as consonants based on syllable position
Consonants occur on periphery
Vowels form the nucleus
Assimilation
A sound becomes like its neighbor; one articulator is involved – a shortcut for the articulator:
Partial assimilation: no change in phonemic category – allophonic changes
Example: Dentalization of /t/ before /ð/ in “eat the cake”
[t ̥ ] is not a new phoneme
Complete assimilation: Phonemic class changes:
Example: Velarization of /n/ before /k/ in “ten cards” “bank”
“anger” - /n/ becomes /ŋ/
[ŋ] contrasts phonemically with [n]
Coarticulation
Two articulators active at the same time for two sounds
Example: lip rounding and tongue tip raising during the production of [t] in “too” or [s] in “stoop”
In the tongue (tip, blade, dorsum), coarticulation may affect only part of the structure (e.g., the tongue body may move toward the next vowel during a [t] closure).
Velum and nasal/oral speech sounds
Most speech sounds are oral (non-nasal):
Soft palate elevated against posterior pharyngeal wall
Velopharyngeal (VP) port closed
Levator palatini muscle active
Degree of VP closure varies with phonetic context
Tighter - for oral obstruents (require airtight seal) / “p t k”
Moderate - for high vowels
Looser - for low vowels
Nasals require open VP port (lowered velum):
Levator palatini muscle is relaxed
Palatoglossus muscle may actively lower velum
Nasal cavities form a resonant chamber
In nasal stops, the oral cavity is blocked at the same places of articulation as for the stops:
At the lips [m]
At the alveolar ridge [n]
At the soft palate [ŋ]
Suprasegmentals
Suprasegmental (or prosodic) features span units larger than a phoneme
Stress: applies to the syllable
Intonation: applies to phrases & sentences
Duration: varies over many units in speech
Juncture: the way adjacent sounds are joined to or separated from each other.
Perception vs. Hearing
Hearing is the physiological response to sound waves, whereas perception is the ability to interpret the sounds in a linguistically meaningful way
Outer ear component
pinna/auricle (external cartilaginous flap)
external auditory meatus (EAM) - canal to tympanic membrane
Outer ear function
pinna
funnels sound into EAM
protects entrance to EAM
assists in sound localization
EAM
protects middle & inner ear
cerumen & cilia filter foreign objects
air filled cavity abt 2.5 cm long and open at one end
Middle ear components
tympanic membrane: border outer & middle ear
ossicles: malleus, incus, stapes
muscles: tensor tympani, stapedius
oval window: entry to inner ear
eustachian tube: path to nasopharynx
Middle ear function
overcome impedance mismatch
possibly attenuate loud sounds via acoustic reflect - middle ear muscles
equalizes internal & external air pressure variations via eustachian tube
Inner ear components
Vestibular system: sense of motion and position
semicircular canals
vestibule
cochlear: sense of hearing
Basilar membrane: membrane runs the length of the cochlea and holds Organ of Corti
Organ of Corti: situated on the basilar membrane; auditory receptor; contains hair cells
Tectorial membrane: connective tissue that covers the cilia)
Inner ear function
hearing & balance
cochlear converts vibrations to neural signals via hair cells on basilar membrane (freq coding: place & timing theories)
Top down process
Listener hears some of the message, makes a rough analysis, synthesizes it into something meaningful, while simultaneously analyzing phonetics, phonemics, morphemic, and syntactic components
Bottom up process
Listener takes auditory information then makes phonetic, then phonemic, then morphemic, then finally syntactic interpretations to derive meaning
Vowels acoustic cues
Formant Frequencies (F1, F2, F3):
F1: Inverse correlate of vowel height (↑F1 = lower vowel, e.g., /æ/).
F2: Correlate of frontness/backness (↑F2 = fronter vowel, e.g., /i/).
F3: Lowered in rhotic vowels (e.g., /ɝ/ in "bird").
Duration: Tense vowels (/i, u, e, o/) are longer than lax vowels (/ɪ, ʊ, ɛ, ʌ/).
Spectral Tilt: Steeper tilt (less high-frequency energy) in back vowels.
Semivowels acoustic cues
Formant Structure: Resemble vowels but shorter (~50–100 ms).
/w/: Low F1 (~300 Hz), low F2 (~800 Hz) (like /u/).
/j/: Low F1 (~300 Hz), high F2 (~2300 Hz) (like /i/).
Smooth Transitions: No turbulence (unlike fricatives).
Diphthongs acoustic cues
Formant Movement: Dynamic shift in F1/F2 (e.g., /aɪ/ in "ride" starts low F1 → high F2).
Duration: Longer than monophthongs.
Rate of Change: Faster transitions distinguish diphthongs from vowel sequences.
Nasals acoustic cues
Nasal Murmur: Extra formants at ~250 Hz and ~2500 Hz.
Antiformants: Spectral dips (due to nasal cavity damping).
Formant Transitions: Similar to stops but weaker (e.g., /m/ has bilabial-like F2 transitions).
Low-Energy: Weak high-frequency energy (above 3000 Hz)
Stops acoustic cues
Voicing:
VOT: Voiced = short/negative VOT; voiceless = long VOT.
F0 & F1 Onset: Higher F0 and lower F1 after voiceless stops.
Place:
Burst Spectrum: Bilabials (low), alveolars (mid-high), velars (compact mid).
F2/F3 Transitions: Velars show "pinch" (F2/F3 convergence).
Manner:
Silent Gap: Closure period (50–100 ms).
Abrupt Formant Onset: After release.
Fricatives acoustic cues
Voicing:
Voiced fricatives have weaker noise + voicing bar.
Place:
Spectral Peak: /s/ = high (~4–8 kHz); /ʃ/ = lower (~2–6 kHz); /f/ = diffuse.
Manner:
Noise Duration: ~100–200 ms.
Turbulent Spectrum: Aperiodic energy.
Affricates acoustic cues
Combination of Stop + Fricative:
Stop Phase: Silent gap + burst.
Fricative Phase: Noise with postalveolar spectrum (like /ʃ/).
Voicing:
/dʒ/ has voicing during closure; /tʃ/ is voiceless.
Factors that impact vowel formants as discussed in chapter 10
Affected by connected speech
Continuous movement of the articulators causes changes in vocal tract shape which affects resonant peaks
Affected by phonemic context and rate of articulation
E.g., juncture, duration, speaking rate, stress
Increased speaking rate often produces a neutralized vowel (schwa)
Differing vocal tract sizes produce variation in resonating cavities
E.g., men, women, children, age
Acoustic cues of supra-segmentals
Intonation:
Changes in fundamental frequency
Pitch changes over the course of an utterance
Stress:
Cued by perceived pitch (most effective cue)
Cued by syllable duration (less effective cue)
Cued by loudness (least effective cue)
Juncture:
the way adjacent sounds are joined to or separated from each other
Cued by a variety of acoustic features: silence, vowel and/or consonant length, presence/absence of voicing & aspiration
Clinical implications – why it is important for the SLP to know the acoustic cues
Precisely diagnose speech-hearing disorders.
Tailor evidence-based interventions (e.g., biofeedback, minimal pairs).
Enhance outcomes for clients with hearing loss, motor speech disorders, and accent differences.
Electromyography (EMG)
Measures electrical activity of neural signals to muscles
Hooked wire (inserted directly)/surface
Spirometer
measures airflow during nonspeech tasks
apparatus for measuring the volume of air inspired and expired by the lungs
Pneumotachograph or Rothenberg Mask
Flow of air during speech usually collected via face mask, Measures airflow during speech
Plethysmograph
Provides a measure of respiratory volume changes during speech
in sealed environment
Pneumography
Records thoracic and abdominal movement associated with speech breathing using body coils
Laryngoscope
a mirror in the oropharynx; gives view of VF during phonation
Stroboscope
tuned to speaker's f0 creates “slow-motion” view of the VFs during phonation
Fiberoptic endoscope (fiberscope)
Light source and camera are introduced through nose into laryngopharynx
Vocal-fold abduction & adduction can be viewed
Also can be used to monitor velar movement
Transillumination (photoglottography, PGG)
Uses a light source to indicate changes in glottal area during phonation; measures the degree of VF separation
Electroglottography (EGG)
Measures the degree of vocal fold contact during adduction
Paired electrodes on either side of thyroid cartilage pass a small current across the larynx
Ultrasound
Used for viewing articulatory movements
Useful for imaging tongue contours
Palatography
Measures contact between tongue and palate
Requires an artificial palate with embedded transducers (prothesis)
Magnetic resonance imaging (MRI)
Permits 3D image of entire vocal tract
Person is placed in a magnetic field
Clinical implications for using instrumentation in practice
Physiological recording can provide immediate feedback on articulatory behavior
Some methods may be difficult to use in clinical settings:
Endoscopy (invasive)
Magnetometry, MRI (expensive; requires technical support)
Methods more easily incorporated into clinical use:
Ultrasound
Pneumography
Palatography
small resonating cavities of human vocal tract are
air space b/w larynx+trachea , teeth+cheeks, lips
not answers with nostrils or thyroid
rounded vowel articulated by dorsum of tongue raised towards roof of mouth by contraction of styloglosseous, raising tongue dorsum…
[u] - soup
3 factors that affect formant vowels in otherwise healthy individuals
affected by connected speech
Affected by phonemic context
rate of articulation