1/221
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Speech features
Prosody (rhythm, stress, intonation), formants, vocal tract shape, vocal fold vibration.
what is prosody make up of?
1.rhythm
stress
intonation
Grouping Sounds
Phones, Phonetic Sounds, Phonemes, and Categorical Perception.
Speech Variability
How speech differs between and within talkers, illustrated by vowel space and consonant distinctions.
What is the primary difference between a "phone" and a "phoneme" in the context of speech sounds?
A "phone" is an individual, raw speech sound, the minimal unit that can differentiate two words.
A "phoneme," is a conceptual category of sounds specific to a language, grouping acoustically variable phonetic sounds together if their differences don't change word meaning within that language.
Explain why "phonetic sounds" can vary significantly even when producing the "same" sound.
Phonetic sounds describe the specific acoustic properties of a speech sound, which include details like loudness, pitch, and exact tongue position. These properties can vary due to individual differences in vocal anatomy, dialectal variations, or even subtle fluctuations in how the same person pronounces a sound at different times.
How does prosody contribute to the acoustic features of speech?
Prosody refers to the rhythm, stress, and intonation of speech. It is an acoustic feature that provides information beyond individual sounds, contributing to the overall melody and emphasis of spoken language and influencing speech recognition and production.
At what approximate age do infants begin to learn about the prosody and stress of their native language, and what is learned at 12 months?
Infants begin to learn about prosody and stress around 5 months of age.
By 12 months, they are learning to recognize and produce consonants, building on earlier acquired knowledge of vowels and phonotactics.
Describe "vowel space" and its significance in understanding speech talker variability
a visual representation of how different vowels are produced based on the frequencies of their formants.
illustrates that speech sounds, particularly vowels, exhibit variability both between different talkers and even within the same talker
What is Voice Onset Time (VOT), and how does it help differentiate English stop consonants?
a measure that differentiates stop consonants based on the timing of vocal fold vibration relative to the release of the consonant.
For English, VOT helps distinguish voiced stops (like /b/, /d/, /g/) from voiceless stops (like /p/, /t/, /k/).
How do researchers create "sound quilts"? Briefly describe the process.
cutting a natural sound (e.g., speech) into small, equal-length segments.
These segments are then reordered based on local acoustic matching at their borders and seamlessly stitched back together using techniques like PSOLA to preserve short-term naturalness.
What is the main goal of using "sound quilts" in research related to speech processing?
is to understand how the brain processes the acoustic propertiesof speech in isolation, without the listener trying to interpret the words or grammar.
It allows researchers to investigate temporal processing and identify speech-specific brain areas.
Identify two key effects that "quilting" has on the natural sound, one short-term and one long-term.
"quilting" preserves the naturalness of the sound within each segment and at the immediate joins due to careful acoustic matching.
In the long term, however, it disrupts or scrambles the larger-scale patterns, rhythms, and dependencies of the original sound, such as sentence intonation.
Name at least two brain regions mentioned in the context of speech sound processing.
Heschl's gyrus (auditory cortex)
superior temporal sulcus.
The planum temporale is also mentioned as part of the auditory network.
Acoustic Properties
The physical characteristics of sound, such as frequency, intensity, and duration, as they relate to speech.
Aspiration
A puff of air that sometimes accompanies the release of a voiceless stop consonant in English (e.g., the 'p' in "pin").
Auditory Cortex (Heschl’s Gyrus)
The part of the temporal lobe that processes auditory information.
Categorical Perception
The phenomenon where a continuous range of acoustic variation is perceived as belonging to a few distinct categories (e.g., perceiving sounds as either /b/ or /p/ even with gradual acoustic changes).
Cochleogram
A visual representation of how the energy of a sound changes over time and frequency, mimicking the processing in the cochlea. Used in creating sound quilts.
Constants
Speech sounds produced with some obstruction of the airflow in the vocal tract.
Diacritics
Small marks added to phonetic symbols in narrowtranscription to indicate specific acoustic details or modifications of a sound.
F1, F2 (Formants)
Resonant frequencies of the vocal tract that are crucial for distinguishing different vowel sounds and providing cues for consonants. F2 onset refers to the starting frequency of the second formant.
Formant Transitions
The changes in formant frequencies over time, particularly important cues for consonant perception as the vocal tract moves from a consonant to an adjacent vowel.
International Phonetic Alphabet (IPA)
A system of phonetic notation that represents all the sounds of human speech.
Intonation
The rise and fall of the voice in speaking, especially as it affects the meaning of what is said. Part of prosody.
Lexical-semantic Content:
The meaning of words and their relationships
Manner of Articulation
How airflow is obstructed to produce a consonant (e.g., stop, fricative, nasal)
Phone
An individual speech sound; the minimal 'unit sound' of a language that can differentiate two words
Phoneme (Phonemic Sound)
A conceptual family of sounds; the way a specific language groups acoustic properties in the speech signal, where variations within the group do not change word meaning.
Phonetic Sound
The acoustic properties of a particular speech sound, focusing on its physical reality and detailed characteristics.
Phonotactocis
The rules governing the possible sound sequences and combinations within a particular language.
Pitch-synchronous Overlap-Add (PSOLA)
A technique used in speech processing to concatenate (join) sound segments smoothly without introducing unnatural jumps or clicks. Used in sound quilt creation.
Place of Articulation
The location in the vocal tract where airflow is obstructed to produce a consonant (e.g., labial, alveolar, velar).
Prosody
The rhythm, stress, and intonation patterns of speech.
Sound Quilts
Special audio stimuli created by segmenting natural sounds (like speech) and reordering them with local acoustic constraints to disrupt long-term structure while preserving short-term naturalness. Used to study brain processing of speech.
Speech Cues
Acoustic features in the speech signal that listeners use to identify and distinguish speech sounds.
Speech Register
Variations in speech due to factors like vocal tract shape and vocal fold vibration.
Speech Talker Variability
The natural differences in acoustic properties of speech sounds both between different speakers and within the same speaker over time.
Stop Consonants
Consonants produced by completely stopping the airflow inthe vocal tract and then releasing it (e.g., /p/, /b/, /t/, /d/, /k/, /g/).
Stress
The emphasis given to certain syllables or words in speech, part of prosody.
Superior Temporal Sulcus (STS)
A region in the temporal lobe involved in various aspects of speech and auditory processing.
Syntactic Content
The grammatical structure of sentences
Temporal Processing
How the brain processes information over time, especially relevant for understanding dynamic signals like speech.
Vocal Folds Vibrations
The rapid opening and closing of the vocal folds, producing voicing in speech sounds.
Vocal Tract Shape
The configuration of the mouth, tongue, lips, and other articulators that shapes the sound produced by the vocal folds.
Voiced
A speech sound produced with vibration of the vocal folds (e.g., /b/, /d/, /g/, vowels)
Voiceless
A speech sound produced without vibration of the vocal folds (e.g., /p/, /t/, /k/, /s/, /f/).
Voice Onset Time (VOT)
The time delay between the release of an oral closure (for astop consonant) and the beginning of vocal fold vibration.
Vowel Space
A graphical representation (often on an F1-F2 plot) showing the range of frequencies used for different vowels, illustrating their acoustic distinctiveness and variability
Vowels
Speech sounds produced with a relatively open vocal tract, typically with vocal fold vibration
Speech Perception
The process by which the sounds of language are heard, interpreted, and understood by the brain.
Coarticulation
The natural phenomenon in speech production where articulatory gestures for one sound overlap in time with those for adjacent sounds. This results in the acoustic properties of a phoneme being influenced by its phonetic context.
Anticipatory Coarticulation (Right-to-Left Influence)
A later sound influences an earlier sound (e.g., lip rounding for /u/ affecting a preceding /s/).
Carryover/Perseveratory Coarticulation (Left-to-Right Influence):
An earlier sound influences a later sound (e.g., an /l/ or /r/ sound affecting a following /d/ or /g/).
Phonetic Contex Effect
How the perception of an identical speech sound changes dramatically based on the sounds adjacent to it.
Spectral Content
The distribution of sound energy across different frequencies.
Speech perception is dynamic and context-dependent:
Sounds are not perceived in isolation.
The "Speech-Specific" vs. "General Auditory" Debate:
While some aspects of speech perception (especially compensation for coarticulation) seem to involve speech-specific, gesture-based mechanisms, other fundamental components (like categorical perception) rely on general auditory processes found across species.
The invariance Problem
Coarticulation creates immense acoustic variability, yet listeners achieve perceptual stability. Theories attempt to explain how this "invariance" is achieved.
The role of Production in Perception
Many findings support a strong link between speech production and perception, where understanding articulatory gestures is crucial for interpreting the acoustic signal
Developmental and Comparative Perspectives
Infants begin with broad auditory discrimination abilities that are refined by exposure to their native language. Similar categorical perception abilities in non-human animals highlight shared auditory processing foundations.
Define coarticulation and provide an example of how it manifests in speech.
the natural overlap of articulatory gestures for adjacent sounds in fluent speech.
For instance, when saying "tooth," the lips begin to round for the /u/ sound even before the tongue has fully completed the /t/ sound, altering the acoustic properties of the /t/.
What is the core difference between "speech-specific mechanisms" and "general auditory mechanisms" in the context of speech perception theories?
Speech-specific mechanisms propose that the brain has dedicated neural processes unique to speech perception, often linked to speech production.
General auditory mechanisms suggest that speech is processed using the same fundamental auditory abilities that process all other sounds, without requiring special adaptations.
Explain the concept of "spectral contrast." How does it predict context effects in speech perception?
describes how the perception of a target sound is shifted in a direction opposite to the spectral characteristics of the preceding context.
For example, if a preceding sound has strong low-frequency energy, the auditory system's sensitivity to those frequencies might be temporarily reduced, making a following sound with higher-frequency energy more likely to be heard
How did Experiments 1a & 1b from Holt & Lotto (2002) challenge the idea that phonetic context effects are solely based on peripheral auditory processing?
showed that phonetic context effects persisted significantly even when the context and target syllables were presented to opposite ears (dichotically).
Peripheral auditory processing (like in the cochlea) is typically restricted to a single ear, so this finding indicated that the effects must occur at more central brainstem or cortical levels.
What was the "companion finding" in Experiment 2 of the "Speech Perception: Unraveling Coarticulation and Gestural Understanding" study, and why was it significant
showed that coarticulatory information in a following/da/ or /ga/ syllable could influence the perception of a preceding ambiguous /al/-/ar/ sound.
This was significant because it demonstrated a right-to-left, enhancing effect, which is difficult for spectral contrast theories (which predict left-to-right, assimilative effects) to explain.
Describe the "invariance problem" in speech perception.
refers to the challenge that acoustic cues for a given phoneme are highly variable and context-dependent due to coarticulation.
Despite this lack of a single, invariant acoustic property, listeners consistently perceive the same intended phoneme.
How does the "Locus Equation" framework attempt to explain how we achieve acoustic invariance in speech perception despite coarticulation?
suggests that the brain perceives speech by mapping acoustic relationships in the speech signal, rather than relying on fixed acoustic cues for individual phonemes.
Proposes a flexible mapping from continuous acoustic features to discrete phoneme categories, allowing for consistent perception despite acoustic variability.
Briefly explain the High-Amplitude Sucking Procedure and what it reveals about infant speech perception
measures changes in an infant's sucking rate in response to auditory stimuli.
A decrease in sucking indicates habituation to a sound, while a recovery in sucking when a new sound is presented demonstrates that the infant can discriminate between the two sounds.
Thisprocedure has revealed that infants perceive speech categorically from a very young age.
What evidence from non-human animal studies suggests that some components of speech perception are not unique to humans?
Studies have shown that non-human animals (like Japanese quails, rodents, and monkeys) exhibit categorical perception for both speech sounds and similar non-speech sounds.
This suggests that fundamental auditory processing abilities, which humans utilize for speech perception, are evolutionarily preserved and not exclusive to our species.
How do gesture-based theories (like Motor Theory or Direct Realist Theory) explain compensation for coarticulation?
propose that listeners do not simply process raw acoustic signals but actively perceive the intended articulatory movements (gestures) of the speaker.
By inferring these gestures, the brain can "parse" the complex acoustic signal, separating the overlapping effects of coarticulation from the intended target sound, thereby achievingperceptual compensation.
Anticipatory Coarticulation:
A type of coarticulation where the articulation of a later sound influences the production of an earlier sound. Also known as right-to-left influence.
Articulators
The parts of the vocal tract (e.g., tongue, lips, velum) involved in producing speech sounds.
Auditory Discontinuities
Natural breaks or abrupt changes in the acoustic signal that the auditory system (of both humans and non-humans) may use to segment or categorize sounds.
Auditory Enhancement
A psychoacoustic phenomenon where the perception of a sound is boosted by the presence of a preceding or following sound. It is generally considered a strictly monaural (single-ear) effect.
Categorical Perception (CP)
The tendency to perceive a continuous range of acoustic stimuli as belonging to distinct, discrete categories, with sharp boundaries between them.
Central Auditory Mechanisms:
Neural processes for sound perception that occur in the brainstem and higher cortical areas, beyond the peripheral auditory system (cochlea, auditory nerve).
Coarticulation
The phenomenon where the production of one speech sound is influenced by the production of adjacent sounds, due to the natural overlap of articulatory gestures.
Compensate for Coarticulation
The ability of the auditory system to perceive an intended, stable phonetic category despite the acoustic variability introduced by coarticulation.
Context-Dependent/Context-Sensitive
Describes how the acoustic properties and perception of a speech soundare influenced by the surrounding sounds in an utterance.
Carryover Coarticulation (Perseveratory Coarticulation):
A type of coarticulation where the articulation of an earlier sound influences the production of a later sound. Also known as left-to-right influence
Dichotic Presentation:
An experimental setup where different auditory stimuli are presented simultaneously to each ear.
Diotic Presentation:
An experimental setup where the same auditory stimuli are presented to both ears simultaneously.
Direct Realist Theory
A theory of speech perception proposing that listeners directly perceive the intended articulatory gestures of the speaker, rather than just acoustic features.
Formant Transitions
Rapid changes in the frequencies of formants (resonances ofthe vocal tract) that occur as articulators move from one sound to another, especially between consonants and vowels.
General Auditory Processes:
Auditory mechanisms and principles that are not unique to speech and are used to process all types of sounds,often shared across different species.
Gesture-Based Theories
Theories of speech perception that emphasize the perception of articulatory movements or intentions of the speaker as central to understanding speech (e.g., Motor Theory, Direct Realist Theory)
High-Amplitude Sucking Procedure:
A research methodology used with infants to study their perceptual abilities, relying on changes in their sucking rate to infer discrimination between stimuli.
Hybrid Speech Stimuli:
Artificially constructed speech sounds created by combining naturally produced segments with synthetically generated segments, used in experiments to precisely control acoustic features.
Invariance Problem (Non-Invariance Problem):
The fundamental challenge in speech perception arising from the lack of a one-to-one, invariant mapping between acoustic cues and perceived phonemes, primarily due to coarticulation.
Locus Equation
A descriptive framework or model that quantifies the relationship between the onset frequency and steady-state frequency of formants in consonant-vowel transitions, suggesting a way the brain might map acoustic relationships to perceive stable phonemes.
Motor Theory of Speech Perception
A theory suggesting that speech perception involves implicitly accessing knowledge of speech production, where listeners perceive the articulatory gestures involved in producing sounds.
Peripheral Auditory Processes:
Initial stages of auditory processing that occur in the outer,middle, and inner ear (e.g., cochlea, auditory nerve).
Phoneme
The smallest unit of sound in a language that can distinguish one word from another (e.g., /b/ in "bat" vs. /p/ in "pat").
Phonetic Context Effects
The influence of surrounding speech soundson the acoustic properties and perception of a target speech sound.
Prototype (in speech perception):
An "ideal" or best example of a speech sound category, around which other variations of that sound are perceived.
Spectral Content:
The distribution of acoustic energy across different frequencies in a sound signal.
Spectral Contrast:
A general auditory phenomenon where the perception of a sound is shifted away from the spectral characteristics of a preceding sound, often leading to a "contrastive" perceptual effect.
Speech-Specific Mechanisms
Hypothesized neural or cognitive processes in the brain that are uniquely adapted for processing speech sounds, distinct from how non-speech sounds are processed.
Voice Onset Time (VOT)
The time delay between the release of a consonant and the onset of voicing (vocal fold vibration) for the following vowel. A key acoustic cue for distinguishing voiced and voiceless stop consonants (e.g., /b/ vs. /p/).