An analog signal is _____ and __ _______
Continuous and time-varying
Speech is an example of a _________ signal
Analog
A digital signal is ______.
Discrete
3 main parameters of sound
frequency, time, and amplitude
3 types of errors that can occur during ADC
Jitter, Quantization noise, and Aliasing
Jitter:
deviation in periodicity
can be a result of irregularities in sampling rate
Quantization noise:
deviation in amplitude measures
can be result of rounding errors in process of quanization
Aliasing:
distortion due to misidentification of frequency
can be result of inappropriate sampling rate
Digital Signal Processing (DSP)
Pre-Processing of a digital signal
Steps of DSP
Speech Signal
Filtering
Digitization
Frame Selection
Windowing
Short-term analysis
Graphic display or numeric output
Elements of filtering
Pre-emphasis, presampling
Elements of digitization
time sampling, quantization
elements of Frame Selection
Frame length, frame overlap
elements of windowing
tapering function
elements of short-term analysis
FFT, LPC, Cepstrum
elements of Graphic display or numeric output
spectogram, spectrum, other
Goal of filtering:
retain wanted parts of the signal while removing parts that do not necessarily provide any information
Pre-Sampling:
“anti-aliasing” - applying filters that block frequencies above the Nyquist frequency for that sample
Aliasing
underrepresentation of the sampling rate because the original signal is underrepresented
Example of anti-alias filter:
DC Off-set
Pre-Emphasis:
Equalizes (boosts weaker) energies over a specified range of frequencies so important aspects of signal have sufficient energy to accurately capture within the quantization bits available
Practical example of filtering in Aud and SLP
Measuring Auditory Brainstem Responses (ABR)
Removes:
direct current (DC) signals from other electronic equipment
60 Hz hum from alternating current (AC) power sources
background EEG activity, unwanted brain activity
uses pre-emphasis method called differential amplification
boosts level of desired evoked potential response while removing the extra noise.
Frame Selction/Windowing:
process of selecting which parts of signal to be analyzed
Window/Frame:
the portion of the signal selected to perform an analysis on
Windowing option examples:
Rectangle
Bartlett
Hanning
Hamming*
Blackman
Gauss
How is ABR recording windowed?
Based on a TIME-specific analysis!
Types of graphic displays of acoustic data:
Waveform, Spectrum, Spectogram, Profiles or contours
Dimensions of a waveform:
Amplitude by time
Types of waveforms (temporal analysis)
raw, envelope
Dimensions of a spectrum:
Amplitude by frequency
Types of spectrums (Spectral Analysis)
Fast Fourier Transform, Linear Prediction Coding, Cepstrum
Dimensions of spectogram:
Amplitude by frequency by time
Types of spectrograms (speech (complex) analysis
Conventional, countour, waterfall
Dimensions of Profiles or contours
Parameter by time
Types of profiles/contours:
f0 trace (pitch contour), intensity profile
Temporal (time-based) analysis works directly on the ______.
Waveform
What information can you analyze from a waveform?
Fundamental frequency
Perturbation Measures
Signal-to-noise ratio
Voice onset time
Vowel duration
Envelope
Fundamental Frequency:
frequency at which a system oscillates/resonates freely
Signal Processing Strategy used to get fundamental frequency:
Pitch determination algorithm (PDA) or pitch extractor
Temporal methods used by PDA:
Zero crossing
Peak Picking
Auto correlation (most modern)
Zero Crossing:
counts every time a wave passes through the zero line within a second, then divides by two to obtain the fundamental frequency
Peak Picking:
Fundamental frequency is derived by identifying wave peaks and counting either the total number of crests or troughs OR total number of peaks in general and dividing by 2
Perturbation measures:
3 types we can measure
jitter
shimmer
signal to noise ratio
Perturbgation:
a deviation from truly periodic and regular patterns of vibration of the vocal folds
Jitter:
variability in the fundamental period of phonation
reported in an absolute value (ms) or relative value (%)
Jitter Percent:
obtained by dividing absolute jitter value by mean fundamental frequency period
Shimmer:
variability of amplitude of successive cycles of waveform
reported in an absolute value (dB) or relative value (shimmer %)
Shimmer Percent:
obtained by dividing absolute shimmer value by the mean amplitude of the waveform
Signal to Noise Ratio:
Ratio of Periodic energy to aperiodic energy in the voice waveform
With NO background noise, SNR = _________
The intensity of the signal
When background noise is louder than the signal, SNR = ________
A negative value
Voice Onset Time:
duration of the interval between release of a stop consonant and the onset of vocal fold vibration (vowel production)
Vowel Duration:
duration of the interval over which the formant pattern (specifically F1 and F2) is stable
aka vowel steady rate
Envelope:
overall profile of waveform
Spectral (frequency based) analysis operate directly on a _______
spectrum
Commonly used software for spectral analyses:
Audacity
PRAAT
Computerized Speech Lab (CSL)
Which spectral analysis software has few spectral analyses options?
Audacity
Which spectral analysis software is most widely used acoustic freeware?
PRAAT
Which spectral analysis software is professional software?
Computerized Speech Lab
Major types of Spectral Analysis:
Fourier Transform: Discrete (DFT) and Fast (FFT),
Linear Predictive Coding (LPC),
Cepstral based analyses,
Mel Frequency Cepstral Coefficients (MFCC)
Fourier Transform
Decomposes a waveform to reveal its frequency content to convert a waveform to a power spectrum
Discrete Fourier Transform
Fourier transform of a finite set of discrete samples from the waveform (determined by sampling rate and windowing)
transforms data from samples into distinct frequency lines within a power spectrum
Fast Fourier Transform
optimized algorithm to calculate DFT
all speech analyses software packages have an implementation of FFT
Linear Predictive Coding
Based on Quazi-periodic nature of speech, by knowing certain parts of the speech signal, other parts can be predicted
Cepstrum
A fourier transfer performed on the spectrum
inverse/transposition of spectrum
What is a cepstrum useful in investigating?
Periodicity/ rate of change of a signal
Terms associated with Spectrum vs. Cepstrum:
Spectrum: frequency and amplitude → Harmonics → filtering
Cepstrum: Quefrency and amplitude → Rahmonics → liftering
2 important features of a cepstrum:
preserves magnitude information about the signal and discard phase related info
emphasizes periodic nature of harmonics
What do cepstrum algorithms reveal in a signal?
Converting the signal and finding one formant enables algorithms that help find patterns to find the others
What do rahmonics show?
correlates to the perceptual “quality” measures of voice
Mel Frequency Cepstral Coefficients (MFCC)
represent short-term power within a second
represents frequency bands as evenly spaced whereas cepstrum represents frequency bands linearly
more representative of human auditory sensitivity (perception of pitch)
Practically, when is mel frequency cepstral coefficients most useful?
in audio compression and speech recognition systems (eg. HA mapping)
How to obtain formants:
by using any spectral analysis method
Two main characteristics of formants:
peak in spectrum of a vowel sound or energy bands in spectrogram
resonance of vocal tract
Which formants are typically used to describe most speech sounds?
F1 and F2
For vowels, what does F1 describe?
Tongue Height
For vowels, what does F2 describe?
tongue position
Formant Amplitude:
Relative amplitude of formants in a formant pattern?
Formant Space:
aka acoustic working space, acoustic vowel space, vowel triangle
plot of F1 vs F2
measures speech intelligibility
several other measures are derived from formant space.
Examples of measures based on (static) formant space:
vowel space area
formant centralization ratio
four vowel articulation index
Formant centroid
Vocalic anatomical functional ratio
long-term formant distribution
Measurements based on “dynamic” aspect of formants:
Formant Transition
Formant Locus
Formant Slope
Locus equation
Vowels, glides, and consonants differ in degree of ________.
Constriction
Sonorant Consonants
NO pressure build up at constriction
Nasal Consonants
lower the velum allowing airflow in nasal cavity
Continuant Consonants
do not block airflow in oral cavity
Resonators:
specific state of vocal tract that amplifies frequencies near the natural frequency of that system
Natural Frequency of a resonator is based on _____.
Length and diameter of the vocal tract
Relation of harmonic frequencies to resonating frequency
If close to resonating frequency: will be amplified
If far from resonating frequency: will be dampened
Relationship of two formants when they are close in frequency to one another,
They tend to boost each other’s amplitude
Formant Bandwidth:
difference (in Hz) between frequencies at +/- 3 dB of the intensity of the center frequency within a formant
Which graphic representation can you find formant bandwidth?
on a Spectrum
Practical use of formant space measurements:
represents maximum working space of a talker
representative of maximum performance
Vowel Space Area
aka F1-F2 area
calculated using a specific formula identifying the area of formant space graph
Used to study variety of speech and voice disorders
Long term formant distribution (LTF)
average formant frequency of a given speaker
calculated by taking average of all formants across all vowels in recorded sample
used to study variety of speech and voice disorders
Primary use of LTF:
forensic speaker identification and in studying effects of age and sex on speech
When is speech dynamic?
when there are changes as a result of consonants embedded along with vowels -- typical running speech
Formant transition:
relative shange from a vowel to a consonant
What speech sounds are formant transitions specifically associated with?
stop consonants
Formant locus:
characteristic value for each place of consonant articulation
** helpful to judge phonemes and speech intelligibility
Formant slope:
the change in formant frequency over an interval of formant transition
** helpful in studying speech intelligibility in dysarthric speakers