SS

Speech Production: Key Vocabulary from the Lecture Notes

Velopharyngeal mechanism and vocal tract anatomy

  • Vocal tract as a system of passageways from sound source (larynx) to the lips and nose
  • Three main pharyngeal sections: oropharynx, nasopharynx, laryngopharynx; each located at different anatomical levels
  • Sound streams and cavities
    • Oral cavity: primary passageway for most speech sounds; can close off to nasal cavity via velopharyngeal mechanism
    • Nasal cavity: passageway opened for nasal sounds; when open, speech stream can enter nasal cavity (nasal resonance)
    • Velopharyngeal coupling between oropharynx and nasopharynx controls whether the nasal cavity is accessible during exhaled speech
  • Velopharyngeal closure
    • Typical exhaled speech is produced with velopharyngeal closure, keeping the nasal passage closed
    • Closure mechanism prevents nasal leak and shapes acoustic traits of oral sounds
    • Velopharyngeal port is the opening between oropharynx and nasopharynx; when opened, nasal sounds or nasal resonance can occur
  • Nasal sounds and nasal resonance
    • Nasal sounds (e.g., /m, n/) involve moments when the speech stream can access the nasal cavity
    • Nasal resonance yields unique acoustic traits named “nasal” sounds
  • Velum (soft palate) and its anatomy
    • Velum is a two-part structure with muscles (soft palate) and a landmark: the uvula
    • Attachments: palatine bone and palatine aponeurosis; a tendinous sheet surrounds the velum
    • Movement for closure: pulled up and back against the wall of the nasopharynx
    • Muscles involved: levator veli palatini; superior pharyngeal constrictor (which constricts nasopharyngeal walls around the velum to close the velopharyngeal passage)
    • Palatoglossus contributes to nasal resonance for certain sounds (e.g., /m/)
  • Velopharyngeal mechanism significance
    • Essential role as a sound shaper and for speech intelligibility
    • Important exceptions: nasal sounds (m, n) which temporarily open the velopharyngeal port to access the nasal cavity
  • Nasal cavity and its functions
    • Biological functions: warms, filters, and humidifies inhaled air; sense of smell
    • Structural components: middle nasal concha (part of the ethmoid bone), nasal septum (cartilage, vomer, ethmoid), other conchae and spongy bone
    • Speech-related function: when the nasal cavity is integrated with the oral cavity, nasal resonance shapes the overall quality of speech; nasal cavity contributes to sound characteristics beyond mere biology
  • Hard palate
    • A bony plate separating the oral and nasal cavities
    • Critical for good speech; integrity of the hard palate supports proper velopharyngeal closure and prevents nasal leakage of exhaled air
    • Landmarks and attachments: maxilla and palatine bones; palatine process; velum attaches near this region
    • Hard palate integrity is important both for speech quality and as a structural boundary between oral and nasal spaces
  • Oral cavity
    • Major role in shaping acoustic events (perceptual events) and in biological functions
    • Lip function: lip closure affects articulation, facial expressions
    • Tongue function: rapid changes in shape drive vowel and consonant production; supports oral constrictions and shaping of the oral cavity
    • Overall role: acts as a primary shaping resonator for most speech sounds and contributes to facial expression and other biological tasks (chewing, swallowing)

Swallowing (deglutition) and related oropharyngeal actions

  • Oral transport phase
    • Bolus moves from the oral cavity toward the pharynx
    • Tongue pushes the bolus into the pharynx
    • Lip compression and mandible movement close the oral opening to propel bolus posteriorly
  • Pharyngeal transport phase
    • Velopharyngeal port closes; larynx moves upward and downward; epiglottis moves to protect the airway; glottis closes to seal the airway
    • Pharynx dilates to guide the bolus downward toward the esophagus
    • Stylopharyngeus muscle contributes to pharyngeal dilation and movement
  • Esophageal transport phase
    • Bolus moves through the esophagus to the stomach, initiating digestion
  • Integrated action
    • Swallowing is a complex, coordinated interplay among many oropharyngeal and articulatory muscles
    • It involves both voluntary and reflexive control and is studied to understand normal and atypical swallowing patterns

Source-Filter theory: the vocal tract as a dynamic filter

  • Core idea
    • The vocal tract acts as a filter (resonator) that shapes the spectral content of a glottal or sound source
    • The source provides energy or a basic spectrum; the vocal tract selectively emphasizes certain frequencies (formants) and de-emphasizes others
  • The filter as a changing cavity
    • The vocal tract is a highly variable, malleable filter that continually changes shape during speech
    • It behaves like a flexible, double-barreled tube whose length and diameter of pharyngeal, oral, and nasal cavities can be adjusted in real time
  • Components of the filter
    • Pharyngeal cavity length and diameter influence resonance
    • Oral cavity shape and its opening alter resonant properties
    • Nasal cavity can lengthen/shorten and influence resonance when coupled with velum position
  • Mechanisms for changing resonance
    • Lowering or lengthening the pharyngeal cavities, lowering the larynx, infrahyoid muscles, and related adjustments emphasize lower frequencies and voiced sound sources
    • Shortening or raising the pharyngeal and oral cavities, using suprahyoid muscles, highlights higher frequencies
    • Adjusting pharyngeal walls via constriction (pharyngeal constrictors) changes resonance peaks
    • Relaxing constrictors and engaging stylopharyngess can shift resonances, affecting the overall spectral output
  • Formants and vowel identity
    • Formants are resonance peaks in the speech spectrum produced by the vocal tract; each vowel has a distinct formant pattern (e.g., F1, F2, F3, etc.)
    • Different shapes of the vocal tract highlight different formants, producing different vowel sounds
    • Nasal coupling modifies the formant pattern and introduces nasal formants and anti-formants
  • Nasals and nasalization
    • Adding nasal cavity (via lowered velum) introduces nasal resonance characteristics
    • Nasals typically dampen higher formants and introduce a nasal formant pattern; anti-formants appear due to resonance suppression caused by nasal coupling
  • Practical implications
    • The same anatomical structures (oral, pharyngeal, nasal cavities) act together to emphasize some formants and dampen others depending on phonemic context
    • Variation in vocal tract shape produces different vowels through different formant configurations

Vowels, formants, and nasality in speech

  • Vowel production and formants
    • Vowels are characterized by prominent formant peaks in the spectrum; formant patterns distinguish one vowel from another
    • Changes in oral cavity shape alter F1, F2, and higher formants; jaw position, tongue height, tongue backness, lip rounding, and mouth opening influence formant values
  • Nasalization and nasals
    • Nasalization occurs when the velum is lowered, allowing nasal resonance to join the oral stream
    • Nasal sounds (/m, n, ŋ/) introduce nasal resonance and dampen higher formants; anti-formants appear due to the nasal cavity's interaction with the oral cavity
  • articulatory adjustments for vowels
    • Vertical jaw/tongue placement, horizontal tongue placement, mouth opening/size, and lip configuration (rounding, spreading, openness) collectively shape formants and vowel quality
  • Summary for vowels
    • Variation in vocal tract shape results in filtered speech output corresponding to different vowel sounds
    • Nasal coupling adds nasal quality and anti-formants to the spectrum, shaping overall vowel perception in nasal contexts

Consonant acoustics: source and filter interactions

  • Broader view: consonants require a source (glottal or other) and a filter (vocal tract constriction)
  • Fricatives
    • Produced by turbulence in the vocal tract; the size/space in front of the constriction influences the spectral hiss
    • Front constrictions (e.g., with the tongue tip and teeth) produce characteristic frication sounds; the place and manner of constriction shape the spectrum
  • Stops (plosives)
    • Produced by momentary occlusion of the vocal tract followed by a sudden release
    • Notation examples: /p/, /t/, /k/ (voiceless), and /b/, /d/, /g/ (voiced)
    • Stop characteristics depend on the glottal configuration (adductor vs. abductor) and the place of occlusion (labial, dental, alveolar, velar, etc.)
    • Key features include occlusion duration, burst release, and voice onset time (VOT)
  • Voice Onset Time (VOT)
    • Definition: the interval from the release burst of a stop to the onset of voicing for the following vowel
    • Measurable trait related to stops and phonation
    • Typical values:
    • ext{VOT} ext{ ranges depending on voicing: } ext{voiced stops have shorter VOT; voiceless stops have longer VOT}
    • Specifically given:
    • For voiceless stops, VOT tends to be longer; typical reported ranges include approximately 10–60 ms in many contexts (exact values depend on language and phonetic context)
  • Stop sub-consonant interactions and nasal anesthesia
    • If the nasal cavity is not subtracted (i.e., velopharyngeal closure), the stop is shaped by the oral constriction with limited nasal leakage
    • Glottal source can be voiced or voiceless, contributing to whether the stop is accompanied by voicing during closure and/or release
  • Auxiliary details
    • Voiceless stops often show aspiration after the burst in many languages
    • The source-filter interaction for stops is influenced by the surrounding vowels, the exact place of occlusion, and the neighboring phonemes

Consonants: nasals, fricatives, and anti-formants in spectrography

  • Turbulent noise and spectrographic cues
    • Fricatives show noise energy across a band of frequencies; voiceless fricatives generally have stronger high-frequency noise than voiced fricatives
  • Nasals and anti-formants in spectrograms
    • Nasals exhibit a nasal formant and weaker, dampened higher formants (anti-formants) due to nasal coupling
  • Stops in spectrograms
    • Occlusion appears as near-white or white spaces (silence) along the time axis during the closure interval
    • Burst release shows a short, high-energy event; voicing during the occlusion appears as a dark band along the frequency axis during the silent interval
  • Diphthongs and semi-vowels
    • Diphthongs show changing formant trajectories over time in spectrograms
    • Semi-vowels show characteristics between vowels and glides with distinctive formant movement

Coarticulation: anticipatory and carryover effects

  • Definition and scope
    • Coarticulation is the mutual influence of neighboring sounds on each other during rapid speech
    • It reflects the fact that articulators move slowly relative to the rapid pace of speech, leading to overlapping gestures
  • Anticipatory coarticulation
    • A feature of a sound is prepared in anticipation of the following sound
    • Example: lips rounding in anticipation of a rounded vowel that follows
    • Nasal or other upcoming vowels can influence current articulation (e.g., nasalization spreading before a following vowel)
  • Carryover (conservation) coarticulation
    • A trait of a sound continues to influence the production even after it has been articulated
    • Example: lingering lip rounding from a previous vowel continuing into the next segment
  • Summary
    • Coarticulation accounts for rapid speech, slow-moving articulators, and the influence of neighboring sounds on each other

Observation and measurement of speech production and acoustics

  • Two broad categories of measures
    • Kinematic assessment: movements and positions of articulators (articulators in motion)
    • Acoustic measurements: time- and frequency-domain representations of speech sounds (spectrographs/spectrograms)
  • Instrumentation and techniques
    • Spectrographs/Spectrograms
    • Visualize duration (time) on the x-axis, frequency on the y-axis, and intensity by darkness of the trace
    • Formants and transitions appear as dark regions; formants are especially visible for vowels
    • Diphthongs and semi-vowels show formant movement over time
    • Nasal formants and anti-formants manifest as characteristic bands and dips
    • Stops show occlusion (white spaces) and bursts; voicing appears as dark bands during occlusion or following the release
    • Measurements and interpretation basics
    • Voiced glottal sounds produce a dark band along the frequency axis and during occlusion for stops
    • Nasals appear with nasal formants and dampened higher formants; anti-formants appear as dips
    • Fricatives produce energy spread across a spectrum with distinctive spectral slopes
  • Kinematic and imaging modalities (for articulation and swallowing)
    • X-ray radiography (Linednorgraphy / radiography): visualizes articulators in motion; useful for understanding normal and atypical swallowing; concerns about exposure risks
    • Magnetic Resonance Imaging (MRI): non-radioactive; static images or sequences; high anatomical detail and movement studies; faster sampling in newer protocols; useful for tongue/pharyngeal wall movement analysis
    • Ultrasound imaging: real-time information about tongue movement; safe and accessible; limited to tongue dorsum and surface structures
    • Electro-palatography (electro-palatography): senses contact between the tongue and the artificial palate fitted to the individual; provides real-time movement data of articulatory patterns
    • Lateral shadow-cast radiography and other radiographic methods provide a static or quasi-static view of vocal tract configuration at a moment in time
  • Practical note
    • Each technology offers different trade-offs in terms of temporal/spatial resolution, invasiveness, safety, and the type of information (kinematic vs spectral/acoustic)

Applications and implications

  • The Source-Filter view helps explain how the vocal tract shapes a broad range of speech sounds from a relatively simple or complex glottal source
  • Understanding velopharyngeal function is important in clinical contexts (e.g., velopharyngeal insufficiency) to ensure proper speech intelligibility and nasal resonance control
  • The articulation system’s dynamic nature (coarticulation and rapid gestural overlap) underpins fluent, rapid speech and informs approaches to speech therapy, language learning, and speech synthesis
  • Spectral cues (formants, nasal formants, anti-formants, burst spectra, and frication) are essential for recognizing vowels, nasals, stops, fricatives, and affricates in noisy or real-world settings

Notes on terminology and relationships

  • Formant: resonance peak in the vocal tract spectrum; each vowel has a characteristic pattern of formant frequencies (F1, F2, F3, …)
  • Anti-formant: spectral dips caused by resonance interactions, notably with nasal coupling
  • Nasal resonance: acoustic trait added when nasal cavity is coupled with the oral cavity
  • VOT (Voice Onset Time): interval between stop release and onset of voicing; a key acoustic measure for stops
  • Coarticulation: overlapping articulatory gestures among neighboring sounds; anticipatory conveys readiness for upcoming sounds, carryover reflects lingering influence of prior sounds
  • Velopharyngeal closure: constriction between velum and pharyngeal walls that prevents nasal air from entering the oral cavity during most speech
  • Nasal cavity as a filter: part of the vocal tract filter that adds nasal energy and dampens certain frequencies, altering the overall spectral output

Ethical and practical implications

  • Imaging and measurement techniques must balance research/clinical benefits with safety considerations (e.g., radiation exposure in X-rays)
  • Accurate articulation modeling has implications for speech therapy, language teaching, and assistive technologies (e.g., speech synthesis and recognition)
  • Understanding normal variability (coarticulation and formant shifts) improves assessment of speech disorders and tailoring of intervention strategies

Key numerical highlights (examples to remember)

  • Voice Onset Time (VOT): a measure related to stops and phonation; longer VOT is typical for voiceless stops; typical ranges are language-context dependent; a common illustrative range for some contexts is approximately
    • ext{VOT} ext{ for voiceless stops} ext{ often exceeds } 10 ext{ ms and can be up to } 60 ext{ ms}
    • The specific values can vary by language and phonetic context
  • Formants and nasal effects are spectral features
    • Nasal coupling introduces nasal formants and anti-formants that dampen certain higher frequencies, altering the vowel and nasal sound spectrum
  • Stops, fricatives, and affricates each have distinct spectrographic signatures on spectrograms, including occlusion gaps, bursts, and noise bands that differentiate categories