1/61
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What is the working hypothesis of FSC?
That voices carry information about a speakers hypothetical speaker space (Nolan, 1991)
Define speaker space
The speaker space refers to when we descrive features together and, therefore, define a space that a speaker occupies
What are some proposed ‘ideal’ phonetic features for the hypothetical speaker space?
high variation between speakers (inter-speaker variation)
low variation within individuals (intra-speaker variation)
not affected by tansmision (phone bandpass)
resistant to disguise
Why do we no longer use the term “speaker identificatiob”?
Speaker identificatiom implies a level of certainty that FSS’s cannot provide.
We can never provide a concrete conclusion of a speaker being the same, nor is it our job to do so.
When can we reasonably provide a categorial rejection of a speaker in a FSC?
A categorical rejection is only truly possible for regional differences btw our DS and KS
What statements do we prpvide at the end of a FSC?
a statement of likelihood- we are assessing evidence not the case
Define the two types of likelihood Ratio
a numerical way of articulating how likely it is that the speaker is the same in our samples
“evidence is 100x more likely assuming the prosecution/defence proposition”
a verbal way of articulating…
evidence provides strong support for the prosecution proposition.
What features can we examine from an audio sample?
f0
segments
dynamic events (sequences, duration, assimilation)
tempo and rhythm
hesitations
voice quality (as a shape of the VT, not in terms of accentual features)
pathology (stuttering, various types of speech impediments)
What is the use of examining segmental features in FSC?
they encode a lot of idiosyncratic information and pathology
we can look at context-defined allophones to do so
David Bieber (Leeman et al., 2025)
a 2004 murder case
US citizen arrested in a stolen car in Leeds
during the arrest he shot and killed a police officer
in the DS, the audio is from off camera
in interview- Beiber refused to speak so KS had to be taken from telephone gambling companies
determining this case was easy- he had a rare accentual mix
mix of West Yorks and US features:
rhoticity
face in ‘mate’ using northern monophthong
How do we choose to conclude a case?
balance it based on similarity and typicality (likelihood ratios)
consider both the prosecution hypothesis (H1) = same person and the defence hypothesis (H0) = different
evaluate each piece of comparitve evidence and remain conservating
What question do we ask to conclude a case?
Is the evidence more likely assuming the Hp or the Hd?
Top Down approach to FSC
develop holistic impression/judgement
break down evidence to support that judgement
Bottom up approach to FSC
consider as many individual elements as possible
allow bigger picture to merge
e.g look at all instances of H-dropping, see that it doesnt happen everytime
Auditory Analysis
can be all you can do
in poor recordings, acoustic analysis can be unreliable
e.g with channel mismatch acoustic might be meaningless
Acoustic Analysis
can reveal features you weren’t sure how to describe
must always be supported by auditory analysis
almost always look at F1 and F2
sometimes look at:
F3 (look below it for F1 and 2)
for cons: VOT, energy of fricatives, F1-3 of /l, r/ and nasals
focus on clear, stressed syllables in content words
Why may it be useful to look at intervocalic portions of uh and um?
occur in all recordings of spontaneous speech
are just nasals and vowels
fundamentally quite acoustically rich in data for extraction
What are formants?
high amplitude energy in the frequency range
they are created by the transmission of noise source from glottis (i.e voicing) through relatively open tract
What does F1 relate to?
openness
gets higher as vowel gets more open
What does F2 relate to?
frontness
F2 drops are you go back
with a front constriction, the F2 is high because it’s generated by the front chamber in the vocal tract
if you have a small resonating chamber in front- small things give you high resonances
therefore, the higher the F2 the more fronted the sound (lower=backer)
Ash 1988
bomb threat by phone
engaged by defence
suspect from New Jersey but born in Philadephia
KS = verbatim reading of threats
Ash measured F1 and 2 for all stressed vowels
revealed differences btw KS and DS
/uː/ (you, do) consistnelty back in DS, mostly fronted in KS (typical PAE)
‘on’ central and open in DS, back and close in PAE KS
VOT of initial stops
this is the phonetic cue to contrast between stops
it is not really voicing but VOT that is the cue
it is the time between the release of the stops and the start of voicing in the following vowel
What is the typical VOT for a voiceless aspirated plosive in English?
40-80ms
/p t k/
What is the typical VOT for a voiced unaspirated plosive in English?
0-25ms
/b d g/
Jones (Lancashire case)
he was released as there was no aspiration in DS
acoustic analysis of VOT revealed no aspiration for DS of Jones
even if he was disguising his accent, it is unlikely he would be able to disguise this
“Quilley” (R v Slade Pearmann)
defence case from 09-14
first to bring in ASR (in this country)
looking to see if there was one member plotting a murder
eventually was clear that it was not his voice in the back
DS= hours of covert
KS= 2 hours of police interview
defence so happy to give more recordings
it was clear they were different people:
/r/ tapping in V_#V contexts
DS: before, 63% [_]
KS: more 0%
Why may it be dififuclt to compare formants of speakers with similar accents?
Their formants will sit on top of each other, won’t necessarily reveal much about their voice.
Will mostly just tells us they have the same vowels which we intuitively know anyway.
Approximate F1 and F2 for a male schwa
F1= 500 Hz
F2= 1500 Hz
Approximate F3 for an English speaker
2500 Hz
What may cause the GOOSE-split observed between e.g food and fool?
the /l/ will cause allophonic differences
is not swapping a phoneme, is just a difference between the two
the velar /l/ is quite far back so vowels will appear backer as well
fool will be closer to cardinal (although crucial to remember it is NOT cardinal)
What percentage of casework involves telephone recordings?
90%
Why do we have a bandpass filter?
It is useful for telephone companies as they have to transmit less speech
300-3400 Hz
Why does the telephone affect F1?
F1 may have some energy as less than 350 Hz
men are going to be affected more as inherently lower formants
so their close vowels will have an F1 of around 300 Hz
the F range is therefore dampened
NB: a technical effect, has nothing to do w speakers
What segments/acoustic features will the telephone filter affect?
F1 of vowels and high frequency fricatives
Kunzel (2001)
took a simultaneous direct recording and a landline call
F1 was artificially raised by up to 14%
close vowels with lower F1s are most affected
F2 not significantly affected as btw the filters (around 800-2000 Hz)
Byrne and Foulkes (2004)
F1 elevated by up to 60% (average was 29%)
this was on mobile phones
demonstrates that effects aren’t just a straigthforword constant increase
What do we need to consider when looking at telephone recordings?
won’t always know whether it’s a mobile or a landline recording
we expect F1 to be higher nonetheless but won’t know by how much all the time
How do we distinguish f0 and pitch?
f0 = fundamental frequency - what we measure
pitch = perceptual correlate - what we hear
How do we use pitch (or f0) in case work?
we don’t really
it’s rarely useful as is not very speaker specific
there is lots of data on it- we have population statistics but doesn’t tell us much
How does vocal fold vibration imoact f0?
fast = higher f0
slow = lower f0
creaky = extremely low f0
What factors may increase fundamental frequency?
(alcohol)
operations
noise level (incl. phones, lombard)
afternoon
reading
What factors may decrease fundamental frequency?
smoking
steroids
depression
morning
colds
Median f0 of 500 men
126 Hz (range is about 80-180)
Lombard
people tend to speak more loudly against noise
means of compensating for limited bandpass range
gives a side effect of higher f0
Why may we expect a higher f0 on the phone?
it is a side effect of the Lombard
(louder = increased airflow from lungs, pushes more airflow through larynx so VF vibrate)
Dynamic events
it may be better to look at these as opposed to static events
examples include:
transitions from sound to sound
diphthong dynamics
can measure formants are 9 points rather than just midpoint
WHat are some audible or categorial dynamic events (Leeman et al., 2025)?
assimilation
elision (dropping)
lenition (weakening or softening of a consonant e.g plosive to fric)
epenthesis (adding)
liasion (intrusive /r/, where normally it is silent)
What is articulation rate and what are the normal values?
Syllables per second of speaking time
average = 4.4-59 syllables/second
maximum = 6.7-8.2 syllables/second
Yorkshire ripper example of assimilation
John Humble
this year
/s # j/ —> [-ʃj-]
thish year
Settings for pitch object for men
change range to 75-200 Hz (may have to change if at edges of cumulative probablility)
extract pitch tier
highlight, view and edit together
have a pitch (f0) trace
will find all the voices bits and takes f0 recordings across
from pitch tier, extract mean (query get mean) for it across entire recording
Voice setting analysis
a long term property of the voice
permeates entire recording
defined by particular positioning of articulators
just comment on it, is it smth to do w:
lips
tongue
nasal cavity
larynx
mouth
vocal tract VQ features
labial, madibular, pharynx, velopharyngeal, larynx)
= all happen above the vocal tract
phonation VQ features
falsetto, creaky, whisper, breathy, murmur, harsh, tremor
= laryngeal or phonatal, happen in the larynx
retroflex
bending tongue tip back consistnelty
velarised
back of tongue towards velum constantly
palatised
back of tongue towards hard palate constantly
nasal
velum constantly lowered
denasal
velum constantly raised- headcold sound