Speech Sound Production, Acoustic Phonetics, and SHM Case Study
Speech Sounds: Production, Transmission, and Perception
Overview
Speech sounds are produced, transmitted through a medium, and perceived by listeners.
Physical properties of speech sounds include frequency, amplitude, duration, wavelength, and waveform.
These properties relate to how we hear and interpret sounds, not just how they are produced.
Physical properties of speech sounds
Frequency
Definition: the number of cycles per second.
Determines pitch of the sound.
Example: Higher frequency = higher pitch; lower frequency = lower pitch.
Amplitude
Definition: the height of the waveform.
Determines loudness.
Duration
Definition: how long a sound lasts.
Affects rhythm and tempo of speech.
Wavelength
Definition: the distance between successive points of identical phase on a wave.
Related to frequency and the speed of sound via the relation $v = f \lambda$ (where $v$ is the speed of sound and $\lambda$ is the wavelength).
Waveform
Definition: the shape of the pressure variation over time.
Encodes the temporal pattern of a sound.
Articulatory phonetics and the IPA
Articulatory phonetics studies how sounds are produced using the vocal tract.
The International Phonetic Alphabet (IPA) provides symbols for each sound.
Example of a simple plosive (air burst)
Lips come together to block the air, then release to create a burst of air (plosive).
Practical notes
Plosives involve a momentary constriction followed by release; this creates a characteristic burst in the acoustic signal.
The IPA chart is a tool to map articulatory actions to symbols.
Acoustic phonetics
Definition: the study of the physical properties of speech sounds as they travel through air.
Transcription types
Broad transcription
Uses slashes $/$ to capture general phoneme realizations.
Example for the word "cat": $/kæt/$.
Narrow transcription
Uses brackets $[ ]$ to capture fine phonetic detail (aspiration, exact place/manner of articulation, etc.).
Example: [kʰæt] may indicate aspiration on the initial /k/.
Limitations of transcription
Transcription does not capture full phonetic quality or prosody (emotion, emphasis, and context can be missing).
Symbols are unique, but sounds can be very similar; some distinctions are difficult to perceive and transcribe, especially for listeners with hearing impairments.
For example, sounds like /s/ vs /f/ or closely related formants can be hard to distinguish in practice; formants themselves can be very close in some cases.
Transcription practices in practice
Transcribers attempt to reflect what a speaker said (informational content) but may miss paralinguistic information such as emotion, irony, or humor.
When decoding recordings, one may miss inside jokes, facial cues, and other non-verbal context that shapes meaning.
Acoustic representation and perception
Acoustic representation makes similarities and differences explicit in a visual/analytic form.
It is universal in the sense that the acoustic description applies across human languages, focusing on perception rather than production.
Perception-focused: includes variations in loudness and other perceptual cues that symbols alone may not convey.
Important caveat: while acoustic representations are useful for understanding hearing, they do not inherently encode linguistic meaning the way a listener or decoder does.
Practical takeaway: a listener is required for true communication; acoustic signals exist independent of interpretation but require perception to become meaning.
Sound propagation and the medium
Speech sounds originate when vocal cords vibrate; similar vibrations occur in musical instruments (e.g., guitar strings).
These vibrations create a pressure wave that travels through a medium.
Medium: typically air for speech.
The wave type: longitudinal wave; the energy propagates through the medium as particles oscillate parallel to the direction of travel.
No medium, no sound: in space (vacuum) or in environments without a medium, sound cannot travel.
Speed of sound depends on the medium; it is faster in denser or more tightly connected media (e.g., sound travels faster in steel than in air).
Human hearing range (typical): about 20\ \text{Hz} \le f \le 20{,}000\ \text{Hz}.
Hearing tests and practical ranges
Typical clinical testing focuses on 250\ \text{Hz} to 8{,}000\ \text{Hz} to assess speech, with high-frequency tests sometimes extending to 20{,}000\ \text{Hz}, especially in specific cases.
High-frequency testing can be important for monitoring ototoxic effects from chemotherapy; those medications can first affect higher frequencies before lower ones.
In counseling, clinicians weigh benefits of treatment against potential loss of hearing, noting the progressive loss starting from high frequencies toward speech-relevant ranges.
Frequency relevance to speech
Most speech cues important for understanding speech lie in the range \sim 1000\ \text{Hz} to \sim 4000\ \text{Hz}; vowels generally occupy lower frequencies; consonants rely on higher frequency content for intelligibility.
Simple harmonic motion (SHM) and the Millennium Bridge case study
Real-world example: the Millennium Bridge (London) swayed when pedestrians walked across it, leading to a temporary closure and a retrofit to dampen motion.
Physics concept: oscillations with a regular, repeating pattern are described as simple harmonic motion.
Simple harmonic motion (general form)
Displacement as a function of time can be written as $x(t) = A \cos(\omega t + \phi)$, where
$A$ is the amplitude (max displacement from equilibrium)
$\omega$ is the angular frequency
$\phi$ is the phase constant
Energy in SHM (mass-spring analogy)
Two energy forms:
Kinetic energy: E_k = \frac{1}{2} m v^2
Potential energy (spring): E_p = \frac{1}{2} k x^2
Total energy remains constant: E{tot} = Ek + E_p = \frac{1}{2} k A^2 = \text{constant}
At turning points (where velocity is zero, $x = \pm A$):
Ek = 0 and Ep = \frac{1}{2} k A^2
The energy is stored as potential energy in the spring.
At the equilibrium point ($x = 0$):
Ep = 0 and Ek = \frac{1}{2} k A^2
The energy is all kinetic; velocity is maximal.
Amplitude: distance from a turning point to the equilibrium position, denoted by $A$.
Conceptual notes on SHM visualization
A displacement-time plot shows a repeating cycle; one cycle is the motion required to repeat its pattern.
The energy exchange between kinetic and potential forms explains the continuous motion in the absence of damping.
Practical linkage to acoustics
Oscillatory motion underpins sound production and perception: vocal fold vibrations and resonances in the vocal tract can be analyzed with SHM concepts.
Period and frequency (recap with convention used in the lecture)
Period (designated as lower-case $t$ in the lecture): the time it takes for one complete cycle.
Relationship to frequency: f = \frac{1}{t} or equivalently t = \frac{1}{f}.
One cycle definition: a single complete oscillation that repeats over time.
Connections to previous material, real-world relevance, and practical implications
The IPA and transcription connect linguistic theory with observable articulatory actions.
Acoustic phonetics links production to perception, highlighting how listeners decode speech signals and how transcription may fail to capture full meaning or emotion.
Understanding the frequency content of speech (1–4 kHz range for intelligibility) informs hearing aid design, speech therapy, and audio technologies.
The Millennium Bridge example demonstrates how collective human behavior can couple to a physical system, producing unexpected resonant effects; why damping and structural design matter in engineering.
Practical takeaway for study: expect to work with frequency, period, and the SHM framework when analyzing oscillatory phenomena in acoustics, and to perform hands-on experiments with tuning forks and other frequency- and period-based demonstrations.
Additional notes and concepts mentioned
The universal and perceptual focus of acoustic analysis means it emphasizes how sound is heard rather than how it is linguistically decoded.
Emphasis on the limitation of acoustic signals: emotion and contextual meaning are not directly encoded in a transcription alone.
The guest examples illustrate the importance of context in interpretation and the role of listeners in communication.
The practical boundary between consonant and vowel analysis: consonants often emphasize higher-frequency content important for intelligibility, while vowels occupy lower-frequency regions.
Summary of key numerical references
Human hearing range: 20\ \text{Hz} \le f \le 20{,}000\ \text{Hz}
High-frequency testing and ototoxicity considerations: tests may extend to 20{,}000\ \text{Hz}; high-frequency loss often precedes loss in the speech range under ototoxic exposure.
Speech-relevant bandwidth: \sim 1000\ \text{Hz} \text{ to } \sim 4000\ \text{Hz} for many intelligibility cues; vowels generally in lower frequency bands.
Period-frequency relation: f = \frac{1}{t} and t = \frac{1}{f} for period $t$ and frequency $f$.
SHM energy relations:
E_k = \frac{1}{2} m v^2
E_p = \frac{1}{2} k x^2
E{tot} = Ek + E_p = \frac{1}{2} k A^2
Hands-on and next steps (as announced)
Upcoming activities will include hands-on demonstrations with tuning forks to illustrate frequency and period concepts, and additional activities to solidify understanding of sound waves and SHM.