Speech Production: Key Vocabulary from the Lecture Notes

Velopharyngeal mechanism and vocal tract anatomy

Vocal tract as a system of passageways from sound source (larynx) to the lips and nose
Three main pharyngeal sections: oropharynx, nasopharynx, laryngopharynx; each located at different anatomical levels
Sound streams and cavities
- Oral cavity: primary passageway for most speech sounds; can close off to nasal cavity via velopharyngeal mechanism
- Nasal cavity: passageway opened for nasal sounds; when open, speech stream can enter nasal cavity (nasal resonance)
- Velopharyngeal coupling between oropharynx and nasopharynx controls whether the nasal cavity is accessible during exhaled speech
Velopharyngeal closure
- Typical exhaled speech is produced with velopharyngeal closure, keeping the nasal passage closed
- Closure mechanism prevents nasal leak and shapes acoustic traits of oral sounds
- Velopharyngeal port is the opening between oropharynx and nasopharynx; when opened, nasal sounds or nasal resonance can occur
Nasal sounds and nasal resonance
- Nasal sounds (e.g., /m, n/) involve moments when the speech stream can access the nasal cavity
- Nasal resonance yields unique acoustic traits named “nasal” sounds
Velum (soft palate) and its anatomy
- Velum is a two-part structure with muscles (soft palate) and a landmark: the uvula
- Attachments: palatine bone and palatine aponeurosis; a tendinous sheet surrounds the velum
- Movement for closure: pulled up and back against the wall of the nasopharynx
- Muscles involved: levator veli palatini; superior pharyngeal constrictor (which constricts nasopharyngeal walls around the velum to close the velopharyngeal passage)
- Palatoglossus contributes to nasal resonance for certain sounds (e.g., /m/)
Velopharyngeal mechanism significance
- Essential role as a sound shaper and for speech intelligibility
- Important exceptions: nasal sounds (m, n) which temporarily open the velopharyngeal port to access the nasal cavity
Nasal cavity and its functions
- Biological functions: warms, filters, and humidifies inhaled air; sense of smell
- Structural components: middle nasal concha (part of the ethmoid bone), nasal septum (cartilage, vomer, ethmoid), other conchae and spongy bone
- Speech-related function: when the nasal cavity is integrated with the oral cavity, nasal resonance shapes the overall quality of speech; nasal cavity contributes to sound characteristics beyond mere biology
Hard palate
- A bony plate separating the oral and nasal cavities
- Critical for good speech; integrity of the hard palate supports proper velopharyngeal closure and prevents nasal leakage of exhaled air
- Landmarks and attachments: maxilla and palatine bones; palatine process; velum attaches near this region
- Hard palate integrity is important both for speech quality and as a structural boundary between oral and nasal spaces
Oral cavity
- Major role in shaping acoustic events (perceptual events) and in biological functions
- Lip function: lip closure affects articulation, facial expressions
- Tongue function: rapid changes in shape drive vowel and consonant production; supports oral constrictions and shaping of the oral cavity
- Overall role: acts as a primary shaping resonator for most speech sounds and contributes to facial expression and other biological tasks (chewing, swallowing)

Swallowing (deglutition) and related oropharyngeal actions

Oral transport phase
- Bolus moves from the oral cavity toward the pharynx
- Tongue pushes the bolus into the pharynx
- Lip compression and mandible movement close the oral opening to propel bolus posteriorly
Pharyngeal transport phase
- Velopharyngeal port closes; larynx moves upward and downward; epiglottis moves to protect the airway; glottis closes to seal the airway
- Pharynx dilates to guide the bolus downward toward the esophagus
- Stylopharyngeus muscle contributes to pharyngeal dilation and movement
Esophageal transport phase
- Bolus moves through the esophagus to the stomach, initiating digestion
Integrated action
- Swallowing is a complex, coordinated interplay among many oropharyngeal and articulatory muscles
- It involves both voluntary and reflexive control and is studied to understand normal and atypical swallowing patterns

Source-Filter theory: the vocal tract as a dynamic filter

Core idea
- The vocal tract acts as a filter (resonator) that shapes the spectral content of a glottal or sound source
- The source provides energy or a basic spectrum; the vocal tract selectively emphasizes certain frequencies (formants) and de-emphasizes others
The filter as a changing cavity
- The vocal tract is a highly variable, malleable filter that continually changes shape during speech
- It behaves like a flexible, double-barreled tube whose length and diameter of pharyngeal, oral, and nasal cavities can be adjusted in real time
Components of the filter
- Pharyngeal cavity length and diameter influence resonance
- Oral cavity shape and its opening alter resonant properties
- Nasal cavity can lengthen/shorten and influence resonance when coupled with velum position
Mechanisms for changing resonance
- Lowering or lengthening the pharyngeal cavities, lowering the larynx, infrahyoid muscles, and related adjustments emphasize lower frequencies and voiced sound sources
- Shortening or raising the pharyngeal and oral cavities, using suprahyoid muscles, highlights higher frequencies
- Adjusting pharyngeal walls via constriction (pharyngeal constrictors) changes resonance peaks
- Relaxing constrictors and engaging stylopharyngess can shift resonances, affecting the overall spectral output
Formants and vowel identity
- Formants are resonance peaks in the speech spectrum produced by the vocal tract; each vowel has a distinct formant pattern (e.g., F1, F2, F3, etc.)
- Different shapes of the vocal tract highlight different formants, producing different vowel sounds
- Nasal coupling modifies the formant pattern and introduces nasal formants and anti-formants
Nasals and nasalization
- Adding nasal cavity (via lowered velum) introduces nasal resonance characteristics
- Nasals typically dampen higher formants and introduce a nasal formant pattern; anti-formants appear due to resonance suppression caused by nasal coupling
Practical implications
- The same anatomical structures (oral, pharyngeal, nasal cavities) act together to emphasize some formants and dampen others depending on phonemic context
- Variation in vocal tract shape produces different vowels through different formant configurations

Vowels, formants, and nasality in speech

Vowel production and formants
- Vowels are characterized by prominent formant peaks in the spectrum; formant patterns distinguish one vowel from another
- Changes in oral cavity shape alter F1, F2, and higher formants; jaw position, tongue height, tongue backness, lip rounding, and mouth opening influence formant values
Nasalization and nasals
- Nasalization occurs when the velum is lowered, allowing nasal resonance to join the oral stream
- Nasal sounds (/m, n, ŋ/) introduce nasal resonance and dampen higher formants; anti-formants appear due to the nasal cavity's interaction with the oral cavity
articulatory adjustments for vowels
- Vertical jaw/tongue placement, horizontal tongue placement, mouth opening/size, and lip configuration (rounding, spreading, openness) collectively shape formants and vowel quality
Summary for vowels
- Variation in vocal tract shape results in filtered speech output corresponding to different vowel sounds
- Nasal coupling adds nasal quality and anti-formants to the spectrum, shaping overall vowel perception in nasal contexts

Consonant acoustics: source and filter interactions

Broader view: consonants require a source (glottal or other) and a filter (vocal tract constriction)
Fricatives
- Produced by turbulence in the vocal tract; the size/space in front of the constriction influences the spectral hiss
- Front constrictions (e.g., with the tongue tip and teeth) produce characteristic frication sounds; the place and manner of constriction shape the spectrum
Stops (plosives)
- Produced by momentary occlusion of the vocal tract followed by a sudden release
- Notation examples: /p/, /t/, /k/ (voiceless), and /b/, /d/, /g/ (voiced)
- Stop characteristics depend on the glottal configuration (adductor vs. abductor) and the place of occlusion (labial, dental, alveolar, velar, etc.)
- Key features include occlusion duration, burst release, and voice onset time (VOT)
Voice Onset Time (VOT)
- Definition: the interval from the release burst of a stop to the onset of voicing for the following vowel
- Measurable trait related to stops and phonation
- Typical values:
- ext{VOT} ext{ ranges depending on voicing: } ext{voiced stops have shorter VOT; voiceless stops have longer VOT}
- Specifically given:
- For voiceless stops, VOT tends to be longer; typical reported ranges include approximately 10–60 ms in many contexts (exact values depend on language and phonetic context)
Stop sub-consonant interactions and nasal anesthesia
- If the nasal cavity is not subtracted (i.e., velopharyngeal closure), the stop is shaped by the oral constriction with limited nasal leakage
- Glottal source can be voiced or voiceless, contributing to whether the stop is accompanied by voicing during closure and/or release
Auxiliary details
- Voiceless stops often show aspiration after the burst in many languages
- The source-filter interaction for stops is influenced by the surrounding vowels, the exact place of occlusion, and the neighboring phonemes

Consonants: nasals, fricatives, and anti-formants in spectrography

Turbulent noise and spectrographic cues
- Fricatives show noise energy across a band of frequencies; voiceless fricatives generally have stronger high-frequency noise than voiced fricatives
Nasals and anti-formants in spectrograms
- Nasals exhibit a nasal formant and weaker, dampened higher formants (anti-formants) due to nasal coupling
Stops in spectrograms
- Occlusion appears as near-white or white spaces (silence) along the time axis during the closure interval
- Burst release shows a short, high-energy event; voicing during the occlusion appears as a dark band along the frequency axis during the silent interval
Diphthongs and semi-vowels
- Diphthongs show changing formant trajectories over time in spectrograms
- Semi-vowels show characteristics between vowels and glides with distinctive formant movement

Coarticulation: anticipatory and carryover effects

Definition and scope
- Coarticulation is the mutual influence of neighboring sounds on each other during rapid speech
- It reflects the fact that articulators move slowly relative to the rapid pace of speech, leading to overlapping gestures
Anticipatory coarticulation
- A feature of a sound is prepared in anticipation of the following sound
- Example: lips rounding in anticipation of a rounded vowel that follows
- Nasal or other upcoming vowels can influence current articulation (e.g., nasalization spreading before a following vowel)
Carryover (conservation) coarticulation
- A trait of a sound continues to influence the production even after it has been articulated
- Example: lingering lip rounding from a previous vowel continuing into the next segment
Summary
- Coarticulation accounts for rapid speech, slow-moving articulators, and the influence of neighboring sounds on each other

Observation and measurement of speech production and acoustics

Two broad categories of measures
- Kinematic assessment: movements and positions of articulators (articulators in motion)
- Acoustic measurements: time- and frequency-domain representations of speech sounds (spectrographs/spectrograms)
Instrumentation and techniques
- Spectrographs/Spectrograms
- Visualize duration (time) on the x-axis, frequency on the y-axis, and intensity by darkness of the trace
- Formants and transitions appear as dark regions; formants are especially visible for vowels
- Diphthongs and semi-vowels show formant movement over time
- Nasal formants and anti-formants manifest as characteristic bands and dips
- Stops show occlusion (white spaces) and bursts; voicing appears as dark bands during occlusion or following the release
- Measurements and interpretation basics
- Voiced glottal sounds produce a dark band along the frequency axis and during occlusion for stops
- Nasals appear with nasal formants and dampened higher formants; anti-formants appear as dips
- Fricatives produce energy spread across a spectrum with distinctive spectral slopes
Kinematic and imaging modalities (for articulation and swallowing)
- X-ray radiography (Linednorgraphy / radiography): visualizes articulators in motion; useful for understanding normal and atypical swallowing; concerns about exposure risks
- Magnetic Resonance Imaging (MRI): non-radioactive; static images or sequences; high anatomical detail and movement studies; faster sampling in newer protocols; useful for tongue/pharyngeal wall movement analysis
- Ultrasound imaging: real-time information about tongue movement; safe and accessible; limited to tongue dorsum and surface structures
- Electro-palatography (electro-palatography): senses contact between the tongue and the artificial palate fitted to the individual; provides real-time movement data of articulatory patterns
- Lateral shadow-cast radiography and other radiographic methods provide a static or quasi-static view of vocal tract configuration at a moment in time
Practical note
- Each technology offers different trade-offs in terms of temporal/spatial resolution, invasiveness, safety, and the type of information (kinematic vs spectral/acoustic)

Applications and implications

The Source-Filter view helps explain how the vocal tract shapes a broad range of speech sounds from a relatively simple or complex glottal source
Understanding velopharyngeal function is important in clinical contexts (e.g., velopharyngeal insufficiency) to ensure proper speech intelligibility and nasal resonance control
The articulation system’s dynamic nature (coarticulation and rapid gestural overlap) underpins fluent, rapid speech and informs approaches to speech therapy, language learning, and speech synthesis
Spectral cues (formants, nasal formants, anti-formants, burst spectra, and frication) are essential for recognizing vowels, nasals, stops, fricatives, and affricates in noisy or real-world settings

Notes on terminology and relationships

Formant: resonance peak in the vocal tract spectrum; each vowel has a characteristic pattern of formant frequencies (F1, F2, F3, …)
Anti-formant: spectral dips caused by resonance interactions, notably with nasal coupling
Nasal resonance: acoustic trait added when nasal cavity is coupled with the oral cavity
VOT (Voice Onset Time): interval between stop release and onset of voicing; a key acoustic measure for stops
Coarticulation: overlapping articulatory gestures among neighboring sounds; anticipatory conveys readiness for upcoming sounds, carryover reflects lingering influence of prior sounds
Velopharyngeal closure: constriction between velum and pharyngeal walls that prevents nasal air from entering the oral cavity during most speech
Nasal cavity as a filter: part of the vocal tract filter that adds nasal energy and dampens certain frequencies, altering the overall spectral output

Ethical and practical implications

Imaging and measurement techniques must balance research/clinical benefits with safety considerations (e.g., radiation exposure in X-rays)
Accurate articulation modeling has implications for speech therapy, language teaching, and assistive technologies (e.g., speech synthesis and recognition)
Understanding normal variability (coarticulation and formant shifts) improves assessment of speech disorders and tailoring of intervention strategies

Key numerical highlights (examples to remember)

Voice Onset Time (VOT): a measure related to stops and phonation; longer VOT is typical for voiceless stops; typical ranges are language-context dependent; a common illustrative range for some contexts is approximately
- ext{VOT} ext{ for voiceless stops} ext{ often exceeds } 10 ext{ ms and can be up to } 60 ext{ ms}
- The specific values can vary by language and phonetic context
Formants and nasal effects are spectral features
- Nasal coupling introduces nasal formants and anti-formants that dampen certain higher frequencies, altering the vowel and nasal sound spectrum
Stops, fricatives, and affricates each have distinct spectrographic signatures on spectrograms, including occlusion gaps, bursts, and noise bands that differentiate categories