Speaker Comparison 1

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/61

flashcard set

Earn XP

Description and Tags

L2

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

62 Terms

1
New cards

What is the working hypothesis of FSC?

That voices carry information about a speakers hypothetical speaker space (Nolan, 1991)

2
New cards

Define speaker space

The speaker space refers to when we descrive features together and, therefore, define a space that a speaker occupies

3
New cards

What are some proposed ‘ideal’ phonetic features for the hypothetical speaker space?

  • high variation between speakers (inter-speaker variation)

  • low variation within individuals (intra-speaker variation)

  • not affected by tansmision (phone bandpass)

  • resistant to disguise

4
New cards

Why do we no longer use the term “speaker identificatiob”?

Speaker identificatiom implies a level of certainty that FSS’s cannot provide.

We can never provide a concrete conclusion of a speaker being the same, nor is it our job to do so.

5
New cards

When can we reasonably provide a categorial rejection of a speaker in a FSC?

A categorical rejection is only truly possible for regional differences btw our DS and KS

6
New cards

What statements do we prpvide at the end of a FSC?

a statement of likelihood- we are assessing evidence not the case

7
New cards

Define the two types of likelihood Ratio

  1. a numerical way of articulating how likely it is that the speaker is the same in our samples

  • “evidence is 100x more likely assuming the prosecution/defence proposition”

  1. a verbal way of articulating…

  • evidence provides strong support for the prosecution proposition.

8
New cards

What features can we examine from an audio sample?

  • f0

  • segments

  • dynamic events (sequences, duration, assimilation)

  • tempo and rhythm

  • hesitations

  • voice quality (as a shape of the VT, not in terms of accentual features)

  • pathology (stuttering, various types of speech impediments)

9
New cards

What is the use of examining segmental features in FSC?

  • they encode a lot of idiosyncratic information and pathology

  • we can look at context-defined allophones to do so

10
New cards

David Bieber (Leeman et al., 2025)

  • a 2004 murder case

  • US citizen arrested in a stolen car in Leeds

  • during the arrest he shot and killed a police officer

  • in the DS, the audio is from off camera

  • in interview- Beiber refused to speak so KS had to be taken from telephone gambling companies

  • determining this case was easy- he had a rare accentual mix

  • mix of West Yorks and US features:

    • rhoticity

    • face in ‘mate’ using northern monophthong

11
New cards

How do we choose to conclude a case?

  • balance it based on similarity and typicality (likelihood ratios)

  • consider both the prosecution hypothesis (H1) = same person and the defence hypothesis (H0) = different

  • evaluate each piece of comparitve evidence and remain conservating

12
New cards

What question do we ask to conclude a case?

Is the evidence more likely assuming the Hp or the Hd?

13
New cards

Top Down approach to FSC

  • develop holistic impression/judgement

  • break down evidence to support that judgement

14
New cards

Bottom up approach to FSC

  • consider as many individual elements as possible

  • allow bigger picture to merge

  • e.g look at all instances of H-dropping, see that it doesnt happen everytime

15
New cards

Auditory Analysis

  • can be all you can do

  • in poor recordings, acoustic analysis can be unreliable

  • e.g with channel mismatch acoustic might be meaningless

16
New cards

Acoustic Analysis

  • can reveal features you weren’t sure how to describe

  • must always be supported by auditory analysis

  • almost always look at F1 and F2

  • sometimes look at:

    • F3 (look below it for F1 and 2)

    • for cons: VOT, energy of fricatives, F1-3 of /l, r/ and nasals

  • focus on clear, stressed syllables in content words

17
New cards

Why may it be useful to look at intervocalic portions of uh and um?

  • occur in all recordings of spontaneous speech

  • are just nasals and vowels

  • fundamentally quite acoustically rich in data for extraction

18
New cards

What are formants?

  • high amplitude energy in the frequency range

  • they are created by the transmission of noise source from glottis (i.e voicing) through relatively open tract

19
New cards

What does F1 relate to?

  • openness

  • gets higher as vowel gets more open

20
New cards

What does F2 relate to?

  • frontness

  • F2 drops are you go back

  • with a front constriction, the F2 is high because it’s generated by the front chamber in the vocal tract

  • if you have a small resonating chamber in front- small things give you high resonances

  • therefore, the higher the F2 the more fronted the sound (lower=backer)

21
New cards

Ash 1988

  • bomb threat by phone

  • engaged by defence

  • suspect from New Jersey but born in Philadephia

  • KS = verbatim reading of threats

  • Ash measured F1 and 2 for all stressed vowels

  • revealed differences btw KS and DS

    • /uː/ (you, do) consistnelty back in DS, mostly fronted in KS (typical PAE)

    • ‘on’ central and open in DS, back and close in PAE KS

22
New cards

VOT of initial stops

  • this is the phonetic cue to contrast between stops

  • it is not really voicing but VOT that is the cue

  • it is the time between the release of the stops and the start of voicing in the following vowel

23
New cards

What is the typical VOT for a voiceless aspirated plosive in English?

40-80ms

/p t k/

24
New cards

What is the typical VOT for a voiced unaspirated plosive in English?

0-25ms

/b d g/

25
New cards

Jones (Lancashire case)

  • he was released as there was no aspiration in DS

  • acoustic analysis of VOT revealed no aspiration for DS of Jones

  • even if he was disguising his accent, it is unlikely he would be able to disguise this

26
New cards

“Quilley” (R v Slade Pearmann)

  • defence case from 09-14

  • first to bring in ASR (in this country)

  • looking to see if there was one member plotting a murder

  • eventually was clear that it was not his voice in the back

    • DS= hours of covert

    • KS= 2 hours of police interview

  • defence so happy to give more recordings

  • it was clear they were different people:

    • /r/ tapping in V_#V contexts

      • DS: before, 63% [_]

      • KS: more 0%

27
New cards

Why may it be dififuclt to compare formants of speakers with similar accents?

Their formants will sit on top of each other, won’t necessarily reveal much about their voice.

Will mostly just tells us they have the same vowels which we intuitively know anyway.

28
New cards

Approximate F1 and F2 for a male schwa

F1= 500 Hz

F2= 1500 Hz

29
New cards

Approximate F3 for an English speaker

2500 Hz

30
New cards

What may cause the GOOSE-split observed between e.g food and fool?

  • the /l/ will cause allophonic differences

  • is not swapping a phoneme, is just a difference between the two

  • the velar /l/ is quite far back so vowels will appear backer as well

  • fool will be closer to cardinal (although crucial to remember it is NOT cardinal)

31
New cards

What percentage of casework involves telephone recordings?

90%

32
New cards

Why do we have a bandpass filter?

It is useful for telephone companies as they have to transmit less speech

300-3400 Hz

33
New cards

Why does the telephone affect F1?

  • F1 may have some energy as less than 350 Hz

  • men are going to be affected more as inherently lower formants

  • so their close vowels will have an F1 of around 300 Hz

  • the F range is therefore dampened

  • NB: a technical effect, has nothing to do w speakers

34
New cards

What segments/acoustic features will the telephone filter affect?

F1 of vowels and high frequency fricatives

35
New cards

Kunzel (2001)

  • took a simultaneous direct recording and a landline call

  • F1 was artificially raised by up to 14%

  • close vowels with lower F1s are most affected

  • F2 not significantly affected as btw the filters (around 800-2000 Hz)

36
New cards

Byrne and Foulkes (2004)

  • F1 elevated by up to 60% (average was 29%)

  • this was on mobile phones

  • demonstrates that effects aren’t just a straigthforword constant increase

37
New cards

What do we need to consider when looking at telephone recordings?

  • won’t always know whether it’s a mobile or a landline recording

  • we expect F1 to be higher nonetheless but won’t know by how much all the time

38
New cards

How do we distinguish f0 and pitch?

f0 = fundamental frequency - what we measure

pitch = perceptual correlate - what we hear

39
New cards

How do we use pitch (or f0) in case work?

  • we don’t really

  • it’s rarely useful as is not very speaker specific

  • there is lots of data on it- we have population statistics but doesn’t tell us much

40
New cards

How does vocal fold vibration imoact f0?

  • fast = higher f0

  • slow = lower f0

  • creaky = extremely low f0

41
New cards

What factors may increase fundamental frequency?

  • (alcohol)

  • operations

  • noise level (incl. phones, lombard)

  • afternoon

  • reading

42
New cards

What factors may decrease fundamental frequency?

  • smoking

  • steroids

  • depression

  • morning

  • colds

43
New cards

Median f0 of 500 men

126 Hz (range is about 80-180)

44
New cards

Lombard

  • people tend to speak more loudly against noise

  • means of compensating for limited bandpass range

  • gives a side effect of higher f0

45
New cards

Why may we expect a higher f0 on the phone?

  • it is a side effect of the Lombard

  • (louder = increased airflow from lungs, pushes more airflow through larynx so VF vibrate)

46
New cards

Dynamic events

  • it may be better to look at these as opposed to static events

  • examples include:

    • transitions from sound to sound

    • diphthong dynamics

    • can measure formants are 9 points rather than just midpoint

47
New cards

WHat are some audible or categorial dynamic events (Leeman et al., 2025)?

  • assimilation

  • elision (dropping)

  • lenition (weakening or softening of a consonant e.g plosive to fric)

  • epenthesis (adding)

  • liasion (intrusive /r/, where normally it is silent)

48
New cards

What is articulation rate and what are the normal values?

Syllables per second of speaking time

average = 4.4-59 syllables/second

maximum = 6.7-8.2 syllables/second

49
New cards

Yorkshire ripper example of assimilation

John Humble

this year

/s # j/ —> [-ʃj-]

thish year

50
New cards

Settings for pitch object for men

  • change range to 75-200 Hz (may have to change if at edges of cumulative probablility)

  • extract pitch tier

  • highlight, view and edit together

  • have a pitch (f0) trace

  • will find all the voices bits and takes f0 recordings across

  • from pitch tier, extract mean (query get mean) for it across entire recording

51
New cards

Voice setting analysis

  • a long term property of the voice

  • permeates entire recording

  • defined by particular positioning of articulators

  • just comment on it, is it smth to do w:

    • lips

    • tongue

    • nasal cavity

    • larynx

    • mouth

52
New cards

vocal tract VQ features

labial, madibular, pharynx, velopharyngeal, larynx)

= all happen above the vocal tract

53
New cards

phonation VQ features

falsetto, creaky, whisper, breathy, murmur, harsh, tremor

= laryngeal or phonatal, happen in the larynx

54
New cards

retroflex

bending tongue tip back consistnelty

55
New cards

velarised

back of tongue towards velum constantly

56
New cards

palatised

back of tongue towards hard palate constantly

57
New cards

nasal

velum constantly lowered

58
New cards

denasal

velum constantly raised- headcold sound

59
New cards
60
New cards
61
New cards
62
New cards