Speech Production: Key Vocabulary from the Lecture Notes
Velopharyngeal mechanism and vocal tract anatomy
- Vocal tract as a system of passageways from sound source (larynx) to the lips and nose
- Three main pharyngeal sections: oropharynx, nasopharynx, laryngopharynx; each located at different anatomical levels
- Sound streams and cavities
- Oral cavity: primary passageway for most speech sounds; can close off to nasal cavity via velopharyngeal mechanism
- Nasal cavity: passageway opened for nasal sounds; when open, speech stream can enter nasal cavity (nasal resonance)
- Velopharyngeal coupling between oropharynx and nasopharynx controls whether the nasal cavity is accessible during exhaled speech
- Velopharyngeal closure
- Typical exhaled speech is produced with velopharyngeal closure, keeping the nasal passage closed
- Closure mechanism prevents nasal leak and shapes acoustic traits of oral sounds
- Velopharyngeal port is the opening between oropharynx and nasopharynx; when opened, nasal sounds or nasal resonance can occur
- Nasal sounds and nasal resonance
- Nasal sounds (e.g., /m, n/) involve moments when the speech stream can access the nasal cavity
- Nasal resonance yields unique acoustic traits named “nasal” sounds
- Velum (soft palate) and its anatomy
- Velum is a two-part structure with muscles (soft palate) and a landmark: the uvula
- Attachments: palatine bone and palatine aponeurosis; a tendinous sheet surrounds the velum
- Movement for closure: pulled up and back against the wall of the nasopharynx
- Muscles involved: levator veli palatini; superior pharyngeal constrictor (which constricts nasopharyngeal walls around the velum to close the velopharyngeal passage)
- Palatoglossus contributes to nasal resonance for certain sounds (e.g., /m/)
- Velopharyngeal mechanism significance
- Essential role as a sound shaper and for speech intelligibility
- Important exceptions: nasal sounds (m, n) which temporarily open the velopharyngeal port to access the nasal cavity
- Nasal cavity and its functions
- Biological functions: warms, filters, and humidifies inhaled air; sense of smell
- Structural components: middle nasal concha (part of the ethmoid bone), nasal septum (cartilage, vomer, ethmoid), other conchae and spongy bone
- Speech-related function: when the nasal cavity is integrated with the oral cavity, nasal resonance shapes the overall quality of speech; nasal cavity contributes to sound characteristics beyond mere biology
- Hard palate
- A bony plate separating the oral and nasal cavities
- Critical for good speech; integrity of the hard palate supports proper velopharyngeal closure and prevents nasal leakage of exhaled air
- Landmarks and attachments: maxilla and palatine bones; palatine process; velum attaches near this region
- Hard palate integrity is important both for speech quality and as a structural boundary between oral and nasal spaces
- Oral cavity
- Major role in shaping acoustic events (perceptual events) and in biological functions
- Lip function: lip closure affects articulation, facial expressions
- Tongue function: rapid changes in shape drive vowel and consonant production; supports oral constrictions and shaping of the oral cavity
- Overall role: acts as a primary shaping resonator for most speech sounds and contributes to facial expression and other biological tasks (chewing, swallowing)
Swallowing (deglutition) and related oropharyngeal actions
- Oral transport phase
- Bolus moves from the oral cavity toward the pharynx
- Tongue pushes the bolus into the pharynx
- Lip compression and mandible movement close the oral opening to propel bolus posteriorly
- Pharyngeal transport phase
- Velopharyngeal port closes; larynx moves upward and downward; epiglottis moves to protect the airway; glottis closes to seal the airway
- Pharynx dilates to guide the bolus downward toward the esophagus
- Stylopharyngeus muscle contributes to pharyngeal dilation and movement
- Esophageal transport phase
- Bolus moves through the esophagus to the stomach, initiating digestion
- Integrated action
- Swallowing is a complex, coordinated interplay among many oropharyngeal and articulatory muscles
- It involves both voluntary and reflexive control and is studied to understand normal and atypical swallowing patterns
Source-Filter theory: the vocal tract as a dynamic filter
- Core idea
- The vocal tract acts as a filter (resonator) that shapes the spectral content of a glottal or sound source
- The source provides energy or a basic spectrum; the vocal tract selectively emphasizes certain frequencies (formants) and de-emphasizes others
- The filter as a changing cavity
- The vocal tract is a highly variable, malleable filter that continually changes shape during speech
- It behaves like a flexible, double-barreled tube whose length and diameter of pharyngeal, oral, and nasal cavities can be adjusted in real time
- Components of the filter
- Pharyngeal cavity length and diameter influence resonance
- Oral cavity shape and its opening alter resonant properties
- Nasal cavity can lengthen/shorten and influence resonance when coupled with velum position
- Mechanisms for changing resonance
- Lowering or lengthening the pharyngeal cavities, lowering the larynx, infrahyoid muscles, and related adjustments emphasize lower frequencies and voiced sound sources
- Shortening or raising the pharyngeal and oral cavities, using suprahyoid muscles, highlights higher frequencies
- Adjusting pharyngeal walls via constriction (pharyngeal constrictors) changes resonance peaks
- Relaxing constrictors and engaging stylopharyngess can shift resonances, affecting the overall spectral output
- Formants and vowel identity
- Formants are resonance peaks in the speech spectrum produced by the vocal tract; each vowel has a distinct formant pattern (e.g., F1, F2, F3, etc.)
- Different shapes of the vocal tract highlight different formants, producing different vowel sounds
- Nasal coupling modifies the formant pattern and introduces nasal formants and anti-formants
- Nasals and nasalization
- Adding nasal cavity (via lowered velum) introduces nasal resonance characteristics
- Nasals typically dampen higher formants and introduce a nasal formant pattern; anti-formants appear due to resonance suppression caused by nasal coupling
- Practical implications
- The same anatomical structures (oral, pharyngeal, nasal cavities) act together to emphasize some formants and dampen others depending on phonemic context
- Variation in vocal tract shape produces different vowels through different formant configurations
Vowels, formants, and nasality in speech
- Vowel production and formants
- Vowels are characterized by prominent formant peaks in the spectrum; formant patterns distinguish one vowel from another
- Changes in oral cavity shape alter F1, F2, and higher formants; jaw position, tongue height, tongue backness, lip rounding, and mouth opening influence formant values
- Nasalization and nasals
- Nasalization occurs when the velum is lowered, allowing nasal resonance to join the oral stream
- Nasal sounds (/m, n, ŋ/) introduce nasal resonance and dampen higher formants; anti-formants appear due to the nasal cavity's interaction with the oral cavity
- articulatory adjustments for vowels
- Vertical jaw/tongue placement, horizontal tongue placement, mouth opening/size, and lip configuration (rounding, spreading, openness) collectively shape formants and vowel quality
- Summary for vowels
- Variation in vocal tract shape results in filtered speech output corresponding to different vowel sounds
- Nasal coupling adds nasal quality and anti-formants to the spectrum, shaping overall vowel perception in nasal contexts
Consonant acoustics: source and filter interactions
- Broader view: consonants require a source (glottal or other) and a filter (vocal tract constriction)
- Fricatives
- Produced by turbulence in the vocal tract; the size/space in front of the constriction influences the spectral hiss
- Front constrictions (e.g., with the tongue tip and teeth) produce characteristic frication sounds; the place and manner of constriction shape the spectrum
- Stops (plosives)
- Produced by momentary occlusion of the vocal tract followed by a sudden release
- Notation examples: /p/, /t/, /k/ (voiceless), and /b/, /d/, /g/ (voiced)
- Stop characteristics depend on the glottal configuration (adductor vs. abductor) and the place of occlusion (labial, dental, alveolar, velar, etc.)
- Key features include occlusion duration, burst release, and voice onset time (VOT)
- Voice Onset Time (VOT)
- Definition: the interval from the release burst of a stop to the onset of voicing for the following vowel
- Measurable trait related to stops and phonation
- Typical values:
- ext{VOT} ext{ ranges depending on voicing: } ext{voiced stops have shorter VOT; voiceless stops have longer VOT}
- Specifically given:
- For voiceless stops, VOT tends to be longer; typical reported ranges include approximately 10–60 ms in many contexts (exact values depend on language and phonetic context)
- Stop sub-consonant interactions and nasal anesthesia
- If the nasal cavity is not subtracted (i.e., velopharyngeal closure), the stop is shaped by the oral constriction with limited nasal leakage
- Glottal source can be voiced or voiceless, contributing to whether the stop is accompanied by voicing during closure and/or release
- Auxiliary details
- Voiceless stops often show aspiration after the burst in many languages
- The source-filter interaction for stops is influenced by the surrounding vowels, the exact place of occlusion, and the neighboring phonemes
Consonants: nasals, fricatives, and anti-formants in spectrography
- Turbulent noise and spectrographic cues
- Fricatives show noise energy across a band of frequencies; voiceless fricatives generally have stronger high-frequency noise than voiced fricatives
- Nasals and anti-formants in spectrograms
- Nasals exhibit a nasal formant and weaker, dampened higher formants (anti-formants) due to nasal coupling
- Stops in spectrograms
- Occlusion appears as near-white or white spaces (silence) along the time axis during the closure interval
- Burst release shows a short, high-energy event; voicing during the occlusion appears as a dark band along the frequency axis during the silent interval
- Diphthongs and semi-vowels
- Diphthongs show changing formant trajectories over time in spectrograms
- Semi-vowels show characteristics between vowels and glides with distinctive formant movement
Coarticulation: anticipatory and carryover effects
- Definition and scope
- Coarticulation is the mutual influence of neighboring sounds on each other during rapid speech
- It reflects the fact that articulators move slowly relative to the rapid pace of speech, leading to overlapping gestures
- Anticipatory coarticulation
- A feature of a sound is prepared in anticipation of the following sound
- Example: lips rounding in anticipation of a rounded vowel that follows
- Nasal or other upcoming vowels can influence current articulation (e.g., nasalization spreading before a following vowel)
- Carryover (conservation) coarticulation
- A trait of a sound continues to influence the production even after it has been articulated
- Example: lingering lip rounding from a previous vowel continuing into the next segment
- Summary
- Coarticulation accounts for rapid speech, slow-moving articulators, and the influence of neighboring sounds on each other
Observation and measurement of speech production and acoustics
- Two broad categories of measures
- Kinematic assessment: movements and positions of articulators (articulators in motion)
- Acoustic measurements: time- and frequency-domain representations of speech sounds (spectrographs/spectrograms)
- Instrumentation and techniques
- Spectrographs/Spectrograms
- Visualize duration (time) on the x-axis, frequency on the y-axis, and intensity by darkness of the trace
- Formants and transitions appear as dark regions; formants are especially visible for vowels
- Diphthongs and semi-vowels show formant movement over time
- Nasal formants and anti-formants manifest as characteristic bands and dips
- Stops show occlusion (white spaces) and bursts; voicing appears as dark bands during occlusion or following the release
- Measurements and interpretation basics
- Voiced glottal sounds produce a dark band along the frequency axis and during occlusion for stops
- Nasals appear with nasal formants and dampened higher formants; anti-formants appear as dips
- Fricatives produce energy spread across a spectrum with distinctive spectral slopes
- Kinematic and imaging modalities (for articulation and swallowing)
- X-ray radiography (Linednorgraphy / radiography): visualizes articulators in motion; useful for understanding normal and atypical swallowing; concerns about exposure risks
- Magnetic Resonance Imaging (MRI): non-radioactive; static images or sequences; high anatomical detail and movement studies; faster sampling in newer protocols; useful for tongue/pharyngeal wall movement analysis
- Ultrasound imaging: real-time information about tongue movement; safe and accessible; limited to tongue dorsum and surface structures
- Electro-palatography (electro-palatography): senses contact between the tongue and the artificial palate fitted to the individual; provides real-time movement data of articulatory patterns
- Lateral shadow-cast radiography and other radiographic methods provide a static or quasi-static view of vocal tract configuration at a moment in time
- Practical note
- Each technology offers different trade-offs in terms of temporal/spatial resolution, invasiveness, safety, and the type of information (kinematic vs spectral/acoustic)
Applications and implications
- The Source-Filter view helps explain how the vocal tract shapes a broad range of speech sounds from a relatively simple or complex glottal source
- Understanding velopharyngeal function is important in clinical contexts (e.g., velopharyngeal insufficiency) to ensure proper speech intelligibility and nasal resonance control
- The articulation system’s dynamic nature (coarticulation and rapid gestural overlap) underpins fluent, rapid speech and informs approaches to speech therapy, language learning, and speech synthesis
- Spectral cues (formants, nasal formants, anti-formants, burst spectra, and frication) are essential for recognizing vowels, nasals, stops, fricatives, and affricates in noisy or real-world settings
Notes on terminology and relationships
- Formant: resonance peak in the vocal tract spectrum; each vowel has a characteristic pattern of formant frequencies (F1, F2, F3, …)
- Anti-formant: spectral dips caused by resonance interactions, notably with nasal coupling
- Nasal resonance: acoustic trait added when nasal cavity is coupled with the oral cavity
- VOT (Voice Onset Time): interval between stop release and onset of voicing; a key acoustic measure for stops
- Coarticulation: overlapping articulatory gestures among neighboring sounds; anticipatory conveys readiness for upcoming sounds, carryover reflects lingering influence of prior sounds
- Velopharyngeal closure: constriction between velum and pharyngeal walls that prevents nasal air from entering the oral cavity during most speech
- Nasal cavity as a filter: part of the vocal tract filter that adds nasal energy and dampens certain frequencies, altering the overall spectral output
Ethical and practical implications
- Imaging and measurement techniques must balance research/clinical benefits with safety considerations (e.g., radiation exposure in X-rays)
- Accurate articulation modeling has implications for speech therapy, language teaching, and assistive technologies (e.g., speech synthesis and recognition)
- Understanding normal variability (coarticulation and formant shifts) improves assessment of speech disorders and tailoring of intervention strategies
Key numerical highlights (examples to remember)
- Voice Onset Time (VOT): a measure related to stops and phonation; longer VOT is typical for voiceless stops; typical ranges are language-context dependent; a common illustrative range for some contexts is approximately
- ext{VOT} ext{ for voiceless stops} ext{ often exceeds } 10 ext{ ms and can be up to } 60 ext{ ms}
- The specific values can vary by language and phonetic context
- Formants and nasal effects are spectral features
- Nasal coupling introduces nasal formants and anti-formants that dampen certain higher frequencies, altering the vowel and nasal sound spectrum
- Stops, fricatives, and affricates each have distinct spectrographic signatures on spectrograms, including occlusion gaps, bursts, and noise bands that differentiate categories