Speech Sound Production, Acoustic Phonetics, and SHM Case Study

Speech Sounds: Production, Transmission, and Perception

  • Overview

    • Speech sounds are produced, transmitted through a medium, and perceived by listeners.

    • Physical properties of speech sounds include frequency, amplitude, duration, wavelength, and waveform.

    • These properties relate to how we hear and interpret sounds, not just how they are produced.

  • Physical properties of speech sounds

    • Frequency

    • Definition: the number of cycles per second.

    • Determines pitch of the sound.

    • Example: Higher frequency = higher pitch; lower frequency = lower pitch.

    • Amplitude

    • Definition: the height of the waveform.

    • Determines loudness.

    • Duration

    • Definition: how long a sound lasts.

    • Affects rhythm and tempo of speech.

    • Wavelength

    • Definition: the distance between successive points of identical phase on a wave.

    • Related to frequency and the speed of sound via the relation $v = f \lambda$ (where $v$ is the speed of sound and $\lambda$ is the wavelength).

    • Waveform

    • Definition: the shape of the pressure variation over time.

    • Encodes the temporal pattern of a sound.

  • Articulatory phonetics and the IPA

    • Articulatory phonetics studies how sounds are produced using the vocal tract.

    • The International Phonetic Alphabet (IPA) provides symbols for each sound.

    • Example of a simple plosive (air burst)

    • Lips come together to block the air, then release to create a burst of air (plosive).

    • Practical notes

    • Plosives involve a momentary constriction followed by release; this creates a characteristic burst in the acoustic signal.

    • The IPA chart is a tool to map articulatory actions to symbols.

  • Acoustic phonetics

    • Definition: the study of the physical properties of speech sounds as they travel through air.

    • Transcription types

    • Broad transcription

      • Uses slashes $/$ to capture general phoneme realizations.

      • Example for the word "cat": $/kæt/$.

    • Narrow transcription

      • Uses brackets $[ ]$ to capture fine phonetic detail (aspiration, exact place/manner of articulation, etc.).

      • Example: [kʰæt] may indicate aspiration on the initial /k/.

    • Limitations of transcription

    • Transcription does not capture full phonetic quality or prosody (emotion, emphasis, and context can be missing).

    • Symbols are unique, but sounds can be very similar; some distinctions are difficult to perceive and transcribe, especially for listeners with hearing impairments.

    • For example, sounds like /s/ vs /f/ or closely related formants can be hard to distinguish in practice; formants themselves can be very close in some cases.

    • Transcription practices in practice

    • Transcribers attempt to reflect what a speaker said (informational content) but may miss paralinguistic information such as emotion, irony, or humor.

    • When decoding recordings, one may miss inside jokes, facial cues, and other non-verbal context that shapes meaning.

  • Acoustic representation and perception

    • Acoustic representation makes similarities and differences explicit in a visual/analytic form.

    • It is universal in the sense that the acoustic description applies across human languages, focusing on perception rather than production.

    • Perception-focused: includes variations in loudness and other perceptual cues that symbols alone may not convey.

    • Important caveat: while acoustic representations are useful for understanding hearing, they do not inherently encode linguistic meaning the way a listener or decoder does.

    • Practical takeaway: a listener is required for true communication; acoustic signals exist independent of interpretation but require perception to become meaning.

  • Sound propagation and the medium

    • Speech sounds originate when vocal cords vibrate; similar vibrations occur in musical instruments (e.g., guitar strings).

    • These vibrations create a pressure wave that travels through a medium.

    • Medium: typically air for speech.

    • The wave type: longitudinal wave; the energy propagates through the medium as particles oscillate parallel to the direction of travel.

    • No medium, no sound: in space (vacuum) or in environments without a medium, sound cannot travel.

    • Speed of sound depends on the medium; it is faster in denser or more tightly connected media (e.g., sound travels faster in steel than in air).

    • Human hearing range (typical): about 20\ \text{Hz} \le f \le 20{,}000\ \text{Hz}.

    • Hearing tests and practical ranges

    • Typical clinical testing focuses on 250\ \text{Hz} to 8{,}000\ \text{Hz} to assess speech, with high-frequency tests sometimes extending to 20{,}000\ \text{Hz}, especially in specific cases.

    • High-frequency testing can be important for monitoring ototoxic effects from chemotherapy; those medications can first affect higher frequencies before lower ones.

    • In counseling, clinicians weigh benefits of treatment against potential loss of hearing, noting the progressive loss starting from high frequencies toward speech-relevant ranges.

    • Frequency relevance to speech

    • Most speech cues important for understanding speech lie in the range \sim 1000\ \text{Hz} to \sim 4000\ \text{Hz}; vowels generally occupy lower frequencies; consonants rely on higher frequency content for intelligibility.

  • Simple harmonic motion (SHM) and the Millennium Bridge case study

    • Real-world example: the Millennium Bridge (London) swayed when pedestrians walked across it, leading to a temporary closure and a retrofit to dampen motion.

    • Physics concept: oscillations with a regular, repeating pattern are described as simple harmonic motion.

    • Simple harmonic motion (general form)

    • Displacement as a function of time can be written as $x(t) = A \cos(\omega t + \phi)$, where

      • $A$ is the amplitude (max displacement from equilibrium)

      • $\omega$ is the angular frequency

      • $\phi$ is the phase constant

    • Energy in SHM (mass-spring analogy)

    • Two energy forms:

      • Kinetic energy: E_k = \frac{1}{2} m v^2

      • Potential energy (spring): E_p = \frac{1}{2} k x^2

    • Total energy remains constant: E{tot} = Ek + E_p = \frac{1}{2} k A^2 = \text{constant}

    • At turning points (where velocity is zero, $x = \pm A$):

      • Ek = 0 and Ep = \frac{1}{2} k A^2

      • The energy is stored as potential energy in the spring.

    • At the equilibrium point ($x = 0$):

      • Ep = 0 and Ek = \frac{1}{2} k A^2

      • The energy is all kinetic; velocity is maximal.

    • Amplitude: distance from a turning point to the equilibrium position, denoted by $A$.

    • Conceptual notes on SHM visualization

    • A displacement-time plot shows a repeating cycle; one cycle is the motion required to repeat its pattern.

    • The energy exchange between kinetic and potential forms explains the continuous motion in the absence of damping.

    • Practical linkage to acoustics

    • Oscillatory motion underpins sound production and perception: vocal fold vibrations and resonances in the vocal tract can be analyzed with SHM concepts.

  • Period and frequency (recap with convention used in the lecture)

    • Period (designated as lower-case $t$ in the lecture): the time it takes for one complete cycle.

    • Relationship to frequency: f = \frac{1}{t} or equivalently t = \frac{1}{f}.

    • One cycle definition: a single complete oscillation that repeats over time.

  • Connections to previous material, real-world relevance, and practical implications

    • The IPA and transcription connect linguistic theory with observable articulatory actions.

    • Acoustic phonetics links production to perception, highlighting how listeners decode speech signals and how transcription may fail to capture full meaning or emotion.

    • Understanding the frequency content of speech (1–4 kHz range for intelligibility) informs hearing aid design, speech therapy, and audio technologies.

    • The Millennium Bridge example demonstrates how collective human behavior can couple to a physical system, producing unexpected resonant effects; why damping and structural design matter in engineering.

    • Practical takeaway for study: expect to work with frequency, period, and the SHM framework when analyzing oscillatory phenomena in acoustics, and to perform hands-on experiments with tuning forks and other frequency- and period-based demonstrations.

  • Additional notes and concepts mentioned

    • The universal and perceptual focus of acoustic analysis means it emphasizes how sound is heard rather than how it is linguistically decoded.

    • Emphasis on the limitation of acoustic signals: emotion and contextual meaning are not directly encoded in a transcription alone.

    • The guest examples illustrate the importance of context in interpretation and the role of listeners in communication.

    • The practical boundary between consonant and vowel analysis: consonants often emphasize higher-frequency content important for intelligibility, while vowels occupy lower-frequency regions.

  • Summary of key numerical references

    • Human hearing range: 20\ \text{Hz} \le f \le 20{,}000\ \text{Hz}

    • High-frequency testing and ototoxicity considerations: tests may extend to 20{,}000\ \text{Hz}; high-frequency loss often precedes loss in the speech range under ototoxic exposure.

    • Speech-relevant bandwidth: \sim 1000\ \text{Hz} \text{ to } \sim 4000\ \text{Hz} for many intelligibility cues; vowels generally in lower frequency bands.

    • Period-frequency relation: f = \frac{1}{t} and t = \frac{1}{f} for period $t$ and frequency $f$.

    • SHM energy relations:

    • E_k = \frac{1}{2} m v^2

    • E_p = \frac{1}{2} k x^2

    • E{tot} = Ek + E_p = \frac{1}{2} k A^2

  • Hands-on and next steps (as announced)

    • Upcoming activities will include hands-on demonstrations with tuning forks to illustrate frequency and period concepts, and additional activities to solidify understanding of sound waves and SHM.