1/151
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
obstruent noise
any sound that is formed through the partial or entire obstruction of air flow
period
the time it takes for one cycle to be completed in a measure of seconds; frequency and period are inversely related, frequency = 1/period
vowels
produced when air flow from the lungs is unobstructed. Each vowel is produced with a different configuration of tongue and lip movements. Sometimes known as the nucleus.
consonants
produced with more articulatory movement and more constriction than vowels.
periodic speech sounds
use the vocal folds as a source, the vocal tract as a resonator, and manners can lead to vowels, dipthtongs, semivowels, and nasals
aperiodic speech sounds
vocal tract is the source, vocal tract is the resonator, and the manner includes stops, fricatives, and africates. Vocal folds are not involved.
mixed aperiodic and periodic speech sounds
uses the vocal folds and vocal tract for the source, the resonator is the vocal tract, and the manner includes voiced stops, voiced fricatives, and voiced affricate.
sonorants
nasal, liquids, and glides. There is free airflow and articulation shapes the vocal-tract cavity. These are mainly characterized by the formant frequencies and they have a periodic laryngeal source (all voiced)
stop bursts
type of obstruent where you release built-up pressure, transient noise
narrow vs. broad band spectrogram
narrow bind has fine frequency resolution, can see the vertical peaks, like heart monitor. Can see the individual glottal pulses. Broad band spectrogram has a fine temporal resolution, looks like straight horizontal lines.
approximants
glides (j,w) and liquids (l,r). These have limited articulatory constrictions that alter resonant frequencies. The formant transitions are typically faster than for vowels. The syllable position is on the periphery of vowels.
resonants
include the approximants (the glides and liquids), and the nasals. These have a periodic laryngeal source
obstruents
include non-sibiliant fricatives, sibiliant fricativs, stops, and affricates. These have an aperiodic supraglottal source.
glides [w]
the production is similar to [u], with a high back tongue position and rounded lips. Styloglossus, orbicularis oris active. The formant values are similar to [u], low F1 and low F2.
glides [j]
has production similar to [i], high front tongue position, genioglossus active. Formant values are similar to [i] with low F1 and high F2.
liquids
include [l,r]; the tongue tip is raised toward the alveolar ridge. For [l] there is tongue-tip contact with the alveolar ridge, and the sides of teh tongue are down and lateral. For r, there is no tongue-tip contact with the alveolar ridge, and often retroflexed and the lip rounds. F3 is low for r and high for l. Can have different sounds depending on where it is in the word. These can function as syllable nuclei like some vowels.
fundamental frequency
the lowest frequency component of a complex sound, that vocal folds can produce. This corresponds to the rhythm of the glottal cycle.
breathiness
inefficient adduction (closing) of glottis during glottal cycle. Hence noise escaping ny air, typical symptom in dysphonia patients. Can be marked by bad harmonics-to-noise ratio (HNR). Would expect to see gritty lines instead of straight across horizontal lines that are separated clearly.
Found this would be more likely found in informal situations. Found that people would judge women as more attractive if their voice was more breathy.
formants and vocal tract length
regions enhanced by resonance. Length of the vocal tract so that if the vocal tract is longer, the formants of the vowels will be lower
velum
the soft palate
pharynx
throat cavity
velopharyngeal port (VP)
a passageway that connects the oral and nasal cavities. Open this in order to activate the nasal sound. For oral sounds, this will remain closed.
nasal sounds
the levator palatini muscle is relaxed, and the palatoglossus muscle may actively lower the velum, and the nasal cavities form a resonant chamber.
in nasal stops, the oral cavity is blocked at the same places of articulation as the stops; e.g. lips for [m], alveolar ridge for [n], soft palate for [ng]
the opening for the VP port creates a large resonant cavity, opens up to the nasal cavity and increases volume. Results in low frequency nasal resonance, the amplitude will be low. This is because there are some anti-resonances that attenuate energy, large resonating space yields high damping, soft walls of nasal cavities absorb energy, acoustic radiation through nostrils is attenuated because of the small openings
fricatives
aperiodic sound source in the upper vocal tract, the airflow is forced through constriction to create turbulence. These can be voiced or voiceless.
sibilant fricatives
alveolar fricatives involves the tongue forming a constriction at the alveolar ridge, air flows through the midline groove or the tongue is against the teeth, short anterior cavity emphasizes high frequencies
post-alveolar fricatives, the tongue forms groove in the alveo-palatal region, lips are often rounded, longer anterior cavity emphasizes lower frequencies
frication noise is stronger than in non-sibilants
source and filter in [s]
noise source is at the alveolar ridge, the small anterior cavity. The quarter-wave resonator is between the alveolar ridge and lips and is 1cm. To calculate the resonant frequency with a wavelength of 4cm (1x4), divide [34,000cm/s] / 4cm= 8,600Hz
higher frequencies emphasized for alveolar fricatives, lower frequencies emphasized for alveo-palatal fricatives, because there is a longer front cavity and the wavelength increases
spectral moments
the center of gravity, standard deviation, skewness and kurtosis
production of stops
complete articulatory closure in the oral cavity, velo-pharyngeal port (VP) is closed, intraoral pressure rises during closured, then this drops at release, vented through mouth. Oral release also creates a transient noise source, called a release burst. Audibly released stops are also called plosives
acoustics of stop manner
presence of a near silent interval during the stop closure. Then there is a rise time (syllable initial) or a fall time (syllable final). There is a release burst, looks like a spot of white followed by a black vertical line.
acoustics of intervocalic stop voicing
there is a presence or absence of closure voicing. Closure duration is longer for voiceless sounds than for voiced sounds. The release burst is stronger for voiceless stops, and the pressure builds up faster with an open glottis.
acoustics of initial stop voicing
syllable-initial stops are mainly differentiated by the voice onset time (VOT).
voice onset time
the delay between the release of burst of filtered obstruent noise and the onset of vibration of the vocal folds. The time between stop release and phonation onset. There are three categories; voicing lead, where the voicing begins before stop release; the zero onset/short lag, where voicing begins at or very shortly after top release; and long-lag VOT where voicing begins well after the release
for unvoiced obstruents, the voice onset time is positive. When the VOT is zero or negative (when voicing starts together with noise burst or earlier), we hear the obstruent as voiced
unvoiced
unvoiced consonants will have a positive voice onset time, called the long-lag, and voicing will begin after the stop release. Vocal fold vibration begins simultaneously with the release of the blockage
voiced
voiced sounds will have a voice onset time that is positive or zero and will be categorized as voicing lead or short lag, depending on whether or not voicing begins before or after the stop release, respectively. Vocal fold vibration begins while the blockage is still occurring.
voicing lead
vocal folds approximated throughout stop closure, closure may be voiced
short lag
vocal folds adducted by the time the stop is released, silent closure, voicing begins on release or just after
long lag
vocal folds adduct after the stop is released, voicing is delayed, the stop is aspirated
acoustics of africates
africates consist of a stop releasing into a fricative. Acoustics of affricates show features of both stops and fricatives, silent/voiced closure region, release burst, and frication noise
problems with velopharyngeal (VP) control
hyponasality or hypernasality may result, could be caused by cleft palate or motor speech disorders
poor control of VP mechanism may impair production of oral obstruents that require buildup of intraoral air pressure
problems of inter-articulator timing in motor speech disorders may affect VOT and stop voicing contrasts
coarticulation
our natural tendency to let the articulatory maneuvers for one sound interfere with those of nearby sounds.
formant transitions
time course changes in vowel formant frequencies resulting from co-articulation.
microprocess
speech can be seen this way, think of sound production as a source-filter, and focus on single speech sounds (vowels, consonants)
macroprocess
speech can be seen this way, people use coordinated speech that must be planned and monitored
effects of age
vocal folds gain mass and pitch becomes lower as we age throughout adolescence
as people reach older age, their vocal folds will begin to lose mass, make them have higher pitch. Older people may have weaker lungs, reduced lung capacity means reduced ability to produce sounds. Also have hearing loss that makes them strain their voice in order to create higher amplitude. Also thinner voice and reduced vocal endurance. Might be due to stiffening of vocal cords with age, cells dying, edema causing rougher voice sound.
bernouli effect
as air travels through narrow passages it must accelerate, and through wider passages it must move slower. Air flowing from the lungs through the small passage of the glottis creates a pressure drop, and this force closes the vocal folds.
cybernetics
the science of self-regulating machines; can also be called robotics
feedback mechanisms
production of speech sounds requires synchronized and instantaneous use of systems, need feedback control
feedback control
a need to monitor the process as it is going
open loop systems
input to amplifier to motor to output, no feedback and change, sensations to CNS to motor activity
closed loop systems
input to amplifier to motor to output with feedback of error before the output is made; sensations to CNS to motor activity, then feedback to correct motor activity into sensations
external feedback
auditory feedback (AF) is slow; tactile feedback (TF) is also slow. Air and bone-conducted sensations. The contributions are assessed using various manipulations, delayed auditory feedback (DAF), increasing/reducing amplitude (Lombard effect), filtering out frequencies. Manipulations elicit compensatory behaviors that show limited contribution of AF because too slow, role in refining targets and monitoring errors
amplitude feedback
increased noise forces the person to speak louder, based on the Lombard effect
proprioceptive feedback (PF)
relatively fast feedback, you can sense it during the process
internal feedback (IF)
theoretically the most rapid, happens before the message goes out
delayed auditory feedback
Speech is amplified and delivered 200 ms after production; leads to dysfluencies due to immediate reactivation after inhibition. stuttering/stammering solution. Speaker records his voice while listening to the recording at a time by monitoring the playback head of the tape recorder. Allows the person to slow down and better coordinate the speech signals from the brain and vocal cords as well as their breathing mechanisms. DAF devices look like a hearing aid, and they can significantly reduce stuttering.
lombard effect
the involuntary tendency of speakers to increase vocal effort when speaking in loud noise to enhance audibility. Speakers are unaware of this. Circuits responsible for this are located in the brainstem. Also includes rise in amplitude, rise in fundamental frequency, flattening of spectral scope, and elongation of signal duration. Males tend to exhibit greater lombard effect. Also seen in animal species and singers that sing with others
tactile feedback
a type of external feedback. Sensations of touch are from articulators in contact, air pressure and flow changes at glottis. Contributions of TF measured using esthesiometer (measures how sensitive parts of body are, two points on the body and see whether or not the person can discriminate them), shapes in the mouth (oral stereognosis), and nerve blocks (numbing of nerves leads to distorted speech). Results show limited contribution of TF in established speakers
proprioceptive feedback
(PF), sense of direction, velocity of movement, and position of articulators from sensors in joints, tendonss, and muscle spindles (intrafusal fibers). Perturbation studies reveal limited effect on well learned motor patterns
internal feedback
delivery of information from brain about motor commands prior to motor response itself; information would be relayed faster than for any other type of feedback. No direct evidence of this feedback, likely because of neural connections among motor areas of brain
self monitoring
people ask themselves if they are expressing their desired message; ask if they are saying what they want to say; ask themselves if they are maintaining social standards; check to see if they are making a lexical error, are synthax and morphology right, making a sound-form error, is the articulation the right speed, loudness, precision and fluency
linguistically oriented models
model of speech that uses the linguistic framework. These assume that the linguistic description and classification show correspondence with the reality of processing. Book published said that when we speak, phonology is presented as a system of rules about how phonetic features (binary marked) can be used to regulate the sound production and perception
motor theory of speech perception
perception of speech is linked to an internal motor recognition. People can classify sounds, relates to mirror neurons.
mirror neuron
a neuron that fires both when an animal acts and when an animal observes the same action performed by another
target models
state that speech is guided by a focus on the acoustic effect, not so much by a specific sequence of articulatory movements. Speaking is not about making the right articulatory movements, it is about making the right speech.
articulatory gestures
the coordinated manipulation of the respiratory system, larynx, and the vocal tract. There is a gestural code that coordinates the gestures. Your repertoire about how you speak and the movement patterns that you need to use.
spatial targets
there is an internal map of the vocal tract in the CNS, speech continually has to start from a different position, speaker is not going back to zero position. Speech makes a trajectory that corresponds with syllable. Feedback tells us how we are doing.
timing models
emphasize that speech must have some advance planning, models try to figure out how speech can be understood as a combination of static properties (articulatory positions) and dynamic transitions (from phoneme to phoneme). Focuses on co-artiuclation, e.g. the difference between tooth and teeth.
Lashley
wrote an article that said that motor actions to say right and tire involve the same elements/phonemes, but in the reverse order. Thought there was a linguistic serial scheme in which phonemes and words are inserted.
Fairbank's model
there is a feedback unit, a closed loop model, feedback gotten from the proprioceptive, tactile, and auditory feedback. Closed loop model that looks at speech as a servomechanism.
equations for standing wave effect
f= c/4L; c is 340m/sec, L is the length of the air canal. Describes which frequencies would be reinforced/reinvigorated
VOT
voice onset time; the voiceless consonants have a longer VOT (e.g. pa and ba); VOT perception is adapted, if the listener has heard many tokens of the voiced end continuum (e.g. many da sounds), she will perceive a smaller increase in VOT as a change to the voiceless category
tonal languages
languages that have similar sounds with different meanings, depending on the tone of the sound
larynx
produces a complex periodic sound (fundamental frequency f0 and higher harmonics)
includes vocal folds and glottis
vocal fry
creaky voice/pulse mode; a vocal mode in which the vocal folds vibrate at such low frequency that the individual vibrations can be heard.
apraxia of speech
motor planning disorder that results in unpredictable speech sound substitutions.
dysarthria
caused by damage to the portions of the brain responsible for articulatory motor movement planning.
speech and age
higher pitch voice in women, lower pitch voice in men; reduced volume and projection of the voice (thin voice), reduced vocal endurance, difficulty of being heard in noisy situations, tremor or shakiness in the voice
Helmholtz
he did experiments with resonance in pipes. Was interested in developing mathematics of resonance. He demonstrated the relationship between cavity sizes and resonances. Longer bottle neck produces lower frequency.
resonance
the way airflow for speech is shaped as it passes through the oral (mouth) and nasal (nose) cavities and provides the quality of perceived sound during speech
standing wave
wave that continues to move in the same wave pattern despite meeting a boundary. Initial wave oscillates through a medium that constricts the ability of the wave to continue to move forward. e.g. your own voice, the glottis is the closed end and the mouth is the open end.
standing wave effect
waves align so that the condensations and rarefactions are strengthened, happens at every odd multiple of the first suitable frequency
closed end to open end standing wave
mechanical wave that is contained in a column that is open on one end and closed at the other end. Medium will resonate at open end, usually will end at closed end unless right wavelength, need fundamental harmonic that is a fourth of the length of the tube. Frequency = 4L/ n
closed end standing wave
a mechanical wave that is contained at both ends of the column; fundamental wave must end at the end of the column, e.g. vibrating string on a guitar, jump rope. Calculate the frequency = 2L/ n.
vocal tract
all cavities superior to the larynx- pharynx, nasal cavity, oral cavity (larynx to lips)
resonator
something that is set into vibration by an external vibrating source
source filter theory
production of all speech sounds involves some sort of modification of airflow from teh lungs; the source of sound is filtered by the properties of the vocal tract. We get power from the infra-glottic vocal tract, source of the larynx (vocal folds and glottis), and filter from the supra-glottic vocal tract
release burst
a term referring to the expiration of built-up air, leading to a sudden audible sound
differences between male and female voices
male have thicker, longer vocal folds and their fundamental frequency (pitch) is lower. Tend to use lower pitch, bigger resonance, smoother voice, heavy articulation, and voice that is louder.
women tend to use a greater variety of pitches, more breathy voice, use their articulators more; women use more hand gestures, more gentle articulation, and softer word choices, acoustic energy tends to center around the lips. Lower resonance, breathier voice, less assertive and quieter voices.
This spectogram is likely: (shows black shaded part on the bottom on both sides, and the middle is mostly white) What pattern of sounds?
Shows a vowel - consonant - vowel sequence
Delayed auditory feedback
A delay in hearing one's own speech, produced artificially. 200 ms
alveolar fricatives vs. alveo-palatal fricatives
alveolar fricatives are emphasized by high frequencies; lower frequencies are emphasized for alveo-palatal frequencies (longer front cavity makes the wavelength increase)
which of the feedback channels goes the fastest
the internal feedback; the neural activities in the brain. Next fast would be proprioceptive feedback, auditory feedback is the slowest, already speaking at that point
an example of frequency cue is that:
a. vowels have lower frequency component than consonants
b. the fact that vowels often have higher frequency components than consonants
a. vowels have lower frequency component than consonants
overall formant frequency pattern in vowels has some potential as a speaker-specific feature, because:
a. formant frequency always depends on the length of the vocal tract, and that varies from person to person
b. because formants are unique for each vowel, c. because formant frequency depends on speaking rate, and that varies from person to person
a. formant frequency always depends on the length of the vocal tract, and that varies from person to person
Resonance
the result of filtering the sound source passing through the supra-glottal cavities of the vocal tract
in speech, supra-glottal cavities are shaped by the articulators
resonant frequencies are partly determined by cavity size
formants
resonances of the vocal tract (peaks of resonance)
imposed frequency
in forced oscillation an outside force imposes the oscillation frequency
when the imposed frequency of oscillation equals the natural frequency of the oscillator, the result is resonance.
sound physics
study of how sound resonates in a tube
length of the tube determines which sound frequencies are accepted (reinforced/invigorated)
tube representation of vocal tract
region between glottis and lips can be modeled as a tube that is open at one end, closed at the other
glottis (with adducted vocal folds) is the closed end
lips (separated for vowel production) form the open end
forced oscillation
the vocal tract is a tube of air
this tube has an almost constant diameter from bottom to top. the fact that it is not straight does not matter much acoustically.
this column of air is forced to vibrate when the glottal cycles start, so this is another example of forced oscillation