Neural basis of Speech Perception

[Auto-generated transcript. Edits may have been applied for clarity.]

Hey. Okay, so second lecture on speech perception.

The last lecture I told you a little bit about speech is how do you define speech sounds?

I introduced you to the source filter theory model,

also told you about categorical perception and also, um, the influence of context on speech perception.

Today's lecture is going to be about cognitive models and a bit of the brain side of things.

So these are the learning outcomes. So start off with the first one.

So we're going to discuss this motor theory of speech perception which is um an old but um, still influential model of speech perception.

Okay, so the motor theory of speech perception was proposed by Alvin Lieberman, who you can see there on the slide.

He proposed that speech perception uses a specialised module in the in the minds of the brain.

That's purpose is specifically for processing speech and not other sounds.

Okay, so this module operates separately from perceiving nonspeech sounds.

He also claimed that it was this. This module or capacity was uniquely human.

And the reason why he proposed the specialised speech module is based on evidence like this.

So this observation that speech sound seems seem to be perceived categorically,

whereas other sounds aren't so other sounds seem to be perceived in a more graded way.

Whereas I told you last in the last lecture, that speech shows this kind of property of categorical perception.

And, uh, yeah, the Yanny Laurel phenomenon was. That was an example of that.

And so this led him to think that, okay, there's something special about the way we processed speech.

Maybe there's a separate and special part of the mind that deals with speech.

So the second part of the the theory of the motor theory is that when you're perceiving speech, the objects of perception are not acoustic,

but inherently to do with your vocal gestures or the vocal gestures of the person producing the speech.

So when you're making sense of speech, you're really bringing to mind the vocal act or gesture that produces the speech.

And it's not so much about the acoustic events in the speech signal that the mind is processing, but it's more.

It's trying to recover those intended vocal gestures and produce the speech in the first place.

So the evidence for this, according to Alvin Lieberman,

is that this is just to do with this observation that the acoustics of speech seem more variable than the gestures that produce speech.

So if you take the example here, this phoneme put in pin versus spin.

If you look at the acoustics, if you look at the waveform, you talked a little bit about that in the last lecture.

You see this acoustic variability. You see this. It's never the same speech sounds.

You know depending on what you're producing before and after.

The acoustics are different. They vary. And so this is the case in in for this example here the put in pin versus spin.

It sounds slightly different. However the gesture that produces that is more or less the same.

It's the lips coming together. No produces a put.

So he thought that, you know, if the if the mind was trying to understand speech by relying on just the acoustics,

which are very variable, that's going to be really difficult to do. So maybe instead what the mind is doing when processing speech is by trying to,

uh, recover or process the, the underlying gestures that produce the speech.

So just to elaborate a little bit more,

just hopefully this will make it a little bit more clear by considering what the alternative to the motion theory would be.

So imagine if you remember this the E in heat.

You have a particular gesture to do with the tongue. It's quite high when you produce E.

Now it could be the. What the brain is doing.

What the mind's doing is, you know, it knows the particular formants,

the acoustic patterns that normally go with the E, and maybe that's all the mind is doing.

It's just kind of recognising those acoustic patterns. So that would be the alternative to the motor theory.

But well, the motor theories claiming instead is that rather than processing the acoustic patterns,

really a lot of the processing behind speech is to do with perceiving the gestures that produce the speech.

The motor theory is an old theory proposed in the 50s, 60s.

Um, but since then, there's been evidence from fMRI.

It's a brain scanning techniques. So this is an eye for my scanner, which, uh, appear to show evidence that's consistent with the motor theory.

So here's an example of a study in 2004 where they got participants.

They scanned their brains, and all they had to do in the scanner is just to passively listen to meaningless monosyllables like bards.

Uh, this sort of thing. So they didn't have to do anything.

There's no task. They just lie down, listened. Even though they weren't moving.

The brain scans showed that when they were listening to the speech, you had activations in the motor and premotor areas of the brain.

The motor and premotor areas are. Outlined in the black here.

So that's not, um, an auditory or sensory region of the brain.

That's a motor region. Um, and even though the participants are not moving, they're not doing any tasks.

You see activation in these motor regions. So this would appear then to be evidence consistent with the motor theory speech perception.

So just listening to speech. What's happening in the brain is that you have these representations to do with gestures producing the speech.

So here's another study. And again using a more recent technique that we have since, you know, developed since the 60s.

So this one is an example of a TMS study.

So with a TMS study what you do is you um, you disrupt using magnetic pulses administered to the to the, to the skull.

That will disrupt neural activity in the brain regions directly underneath this coil here.

Okay, so this is another way to see, um, examine brain function.

Um, and look at, you know, brain mechanisms behind, uh, in this case, speech perception.

So in this study here in this TMA study, what they observed is.

A TMS over premotor areas interferes with phoneme discrimination in noise, but not colour discrimination.

Colour discrimination was the control task.

So just to explain this graph here, what you're seeing is the performance of participants not doing the the speech task.

And that's not colour speech discrimination.

And then this is the colour discrimination task the controlled task here you've got a baseline condition where there's no TMS applied.

It's like a fake TMS. Here is the TMS over the pre motor cortex.

And here's uh TMS over another region.

So this is a control region. Oops sorry.

This is a control region. This is also a control. And this is a control task.

There's lots of controls here. Uh, but the key thing to take away is that.

You see an effect on behaviour when TMS is applied very specifically to the premotor cortex in the speech discrimination task,

and that's why this bar here is lower than the others. Okay. So that's evidence.

That's again you see involvement of premotor areas.

And in fact this is showing rather than correlational evidence which is what you get with that fMRI.

This is showing causal evidence because you're you're causally you're really changing through this TMS.

You're disrupting activity. So that's causal evidence that the premotor cortex has a role in speech perception.

Okay. So having given you some evidence for the motive theory, let's look at evidence against the motor theory.

So one bit of evidence against the motor theory is a categorical perception.

You could also demonstrate it for non speech sounds. I mean that's been discovered since Lieberman proposed his theory that the time

Lieberman thought that categorical perception was a uniquely speech thing.

But no, actually you could even demonstrate it for non speech stimuli.

Stimuli like musical intervals.

And so this then doesn't seem to be, um, saying that speech is, uh, you know, there's a speech specialised speech module or.

Sorry, this is not saying this seems to be suggesting that categorical perception is not the result of a specialised speech module.

So one of the bits of evidence that Lieberman was using to argue for the special Speech module is really not just not true.

Remember also that, uh, Lieberman claimed that, uh, the speech module was very, you know, uniquely human.

But it turns out that we training chinchillas. So categorical perception.

So it's a similar task. I mean, I think they have to obviously you can't ask a chiller whether they're hearing a double task.

There's there's other techniques to use with animals, but they are able to infer, um,

you know, they're given some reward and learn to associate a particular response,

um, with a, with all the stimuli along the continuum that you hear in the categorical perception experiment.

And then you can you can show, you know, you can do this plot, this, this curve, as you do for humans, which is in the solid line.

And you can see this categorical perception.

So for the ambiguous sounds near the the phoneme boundary, you see the sudden change which is one of the characteristics of speech perception.

But you can also see also a quite sudden change for the chinchillas.

Okay, maybe it's a bit less steep, but you can see even though the acoustics are gradually changing,

you can see that around the 30 boundaries quite steep here. This curve.

So categorical perception. It can be so.

Not just for speech though. So not speech sounds.

And also you can also show it for animals as well as humans.

So if you're going to use categorical perception to argue that there's a specialised speech module,

well, it doesn't seem that is a very good argument based on the evidence on the slide.

So I think nowadays people don't really think this is true, this part of the theory.

However, I think you'll see later that there's still this idea that speech perception evolves.

Processing of intended vocal gestures. That's really I think that's partly, at least partly true.

And it's just a it's just a degree of how important you think this is,

that some people who think that this is central to speech perception, the processing, the vocal gestures,

but there's other people that take a bit of a weaker, uh,

less strong points of view that maybe processing of vocal gestures modulates or influences speech perception,

but it's not necessarily necessary for speech perception. So maybe there's a been a lot of auditory processing going on when you understand speech.

And there's also this, uh processing of the motor gestures as well.

Okay, so that's the motor theory of speech perception. Now, this introduce you to a very influential model of the neural basis of speech perception.

So this is the dual streams model. So you may already maybe A-level psychology or.

She does some reading in textbooks. You may have been, um, introduced to this classic model already.

So this originates from the 19th century.

And this classic model is proposed that there's this region of the brain, the superior temporal gyrus that's critical for speech perception.

And so this is otherwise known as verna because area Vertica is named after.

This person here called Vertica. He was a neurologist from the 19th century.

And he he made the the proposal that this region is involved speech perception based on observation in patients who had um.

You know, after they died,

he did a bit of a surgery and a post-mortem and looked at their brains and saw that there was damage in this particular region.

And before they died, you know, they show these speech comprehension deficits.

And so this is why he made this proposal that this region here was involved with speech perception.

And so again, advanced water. I'm going to give you an example of one of these post mortem brains with damage to this region.

So if you don't like this sort of thing, you may want to turn away.

This is an example of a vendor, because a phasor you can see here corresponds to this region where you see this because of damage.

I mean maybe extends a bit more broadly than that, but you can see it's quite centred on this region here.

So this is the temporal lobe here. This is the superior temporal gyrus.

Sorry. Oops. You go in the cortex, these folds in the brain, they're called gyri.

And it's the this gap between gyri.

So you got superior temporal gyrus here in the middle temporal gyrus. The gap between them is called a sulcus.

So what we're dealing here with is the.

You can see that it's centred on the the back or posterior parts of the superior temple gyrus, which is this region where I'm waiting to hear.

So it sits at the top of the temporal lobe.

The other part of the classic model is that you've got this inferior frontal gyrus that's vote for producing speech.

And that's where this Broca's area named after Paul Broca.

The other part of the classic model is that these regions are proposed to be happening or occurring in the left hemisphere.

So you'll see often this idea that language is a left hemisphere function in it.

Partly that's because of this classic model which claimed that these two regions are in the left hemisphere,

and they're involved with producing, producing speech and understanding speech.

So that's a classic model. But nowadays the thinking is a little bit different.

So this, uh. Still elements of that classic model.

But the, you know, the I think most people think that is the the reality is more complicated than that.

So this is an example. The modern theory of the neural basis of speech perceptions.

This dual stream. This model. So in this model that you've got not just, uh.

And this model actually is just dealing with speech, uh, speech perception, speech processing, not even dealing with speech production at all.

But even for speech processing, speech perception, there's not just a single brain system or single region that does speech perception.

There's multiple regions organised and two streams or pathways of processing.

So you've got this ventral stream that's involved in word recognition.

So if you're trying to understand, you know, what word am I hearing?

What word does it mean? Well, the proposal is that this ventral stream will be involved for doing those things.

And so you can see the ventral stream here in this red arrow. So it originates here the top of the temporal lobe.

So this is your auditory cortex here. Then it goes into the inferior parts of the temporal lobe.

And then it wraps around in the anterior temporal regions.

And then this part of the frontal cortex here. So the bottom or ventral parts of the frontal cortex.

Note also that this this idea here for the ventral stream is that it's bilateral.

So it's it's there in both hemispheres, not just the left hemisphere.

So the other stream is the dorsal stream shown in green. So it starts at the same place in the auditory cortex.

But you can see that it goes backwards.

Or the posterior direction here goes into the parietal lobe and then terminates here in the the upper or dorsal part of the frontal cortex.

So this dorsal stream is proposed to be involved from linking perception with production.

Will be very important in a task involving not identifying words but discriminating speech sounds.

So what am I hearing is about are these are not real words.

So it's not going to involve the ventral stream. But if your focus is on the the logical, you know,

the phonological or phoneme contents or the even the acoustics, then this dorsal stream is going to be involved.

For that sort of task. And it's thought that, uh, the dorsal stream would be really important for learning to speak when you're a child,

but that's, uh, it might continue to to function in adulthood.

So if you think about a child learning to speak, they might, uh, you know, produce the sounds, right?

And then they'll get some auditory feedback.

They'll hear whether you know what they're producing, how how good it is, you know, the speech production.

And if it's not quite right, you know, based on that, what they're hearing, then they'll try and, you know,

refine the speech production until, you know, the, um, what they're hearing is the intended sort of target for what they want to say.

Um, and so clearly that's, uh, that's a behaviour that's evolving both the perception side of things as well as the production side of things.

And it's kind of this interaction between perception and production.

Um, and then if you think in adulthood, especially when you're learning a new language,

if you're trying to tune into new sounds are kind of difficult for you to pronounce.

Um, because if we don't use them in English or, you know, you or whatever, your native languages,

and you can imagine as well that you really your attention is on the speech sounds rather than,

um, what what words mean are these for all aspects of learning the language.

And so again, so this the idea that it would be that the dorsal stream is very important for, for those um,

that sort of um behaviour or learning can also see that this is uh, the dorsal stream is the one that's supposed to be left hemisphere dominant.

So this whereas the ventral stream is bilateral both hemispheres the dorsal stream is proposed to be left hemisphere dominance.

So the nice thing about this model is that it explains this counterintuitive finding.

That's the sum of physics. You can tell apart speech sounds, but they can recognise words just fine and then vice versa.

So there's some physics who can't recognise words at all, but they can tell apart speech sounds just fine.

And so this model explains this observation, this dissociation.

Because in the physics you have damage to the ventral stream.

Then um, that would result in a word recognition deficits.

But it would, you know, it wouldn't affect their ability to discriminate speech sounds, for example.

And then in other patients where you have damage to this stream, they wouldn't be able to tell apart bah dah.

But they could understand, uh, recognise and identify words.

Just to relate this back to the classic model.

You know, I've just already explained the difference here that the evolves both hemispheres, at least for the ventral stream.

So that's difference of the classic model. But also verticals and brokers areas are both part of the dorsal stream of support speech discrimination.

So Broca's area in the classic models just for speech production.

But in this model it has a role to play in linking perception with production as parts of this dorsal stream.

So he has a bit of evidence for from your imaging for the ventral stream.

So this is a study here where they looked at SAP.

Patients with damage to anterior or temporal cortex shown here.

And so this is part of the ventral stream.

And in these patients they show that there's they have a particular problem with or difficulty in doing a semantic based task.

And so an example of this task would be you'd see I think one of the stimuli would be you'd see a um, a palm tree.

And then you've got different words that you have to choose between that match that picture.

And so one of those options would be, uh, so if you see a palm tree, you might get an option of pyramids.

So they're not, you know, directly related. But obviously there's a relationship semantically.

So that will be the correct answer in that case. And so you have to, you know, to do that task you need semantic processing for this.

Um, this sort of task that these patients would suffer on.

And so that's really consistent with this idea that the ventral stream is involved with word recognition and extracting meaning from words.

Here is another study where damage to a different part of the ventral stream.

So remember it started here and then went to this region first before it went here.

So this is fear temporal cortex here. And again damage with this region.

The patient study this part of this study showed specific problems with uh comprehending speech.

So again pointing towards the ventral stream being important for word recognition.

We've already seen evidence for the dorsal stream when we talked about the motor theory.

So dorsal stream claims that this, you know,

this roots that's going into the parietal cortex and then ending up in frontal regions that includes premotor and then motor cortex.

And we've already seen that these regions are involved with with speech processing.

So from Ryan with TMS. So this is evidence for the for the dorsal stream.

Okay, so I think this is a nice point here to interrupt the lecture and then do a little bit of a poll.

Okay, so I'm going to test your memory.

Not what I've been speaking about today, but for the last lecture, we talked about categorical perception.

Okay. So here's the question. And you can also by going to that web link.

So do you remember why I said? Categorical perception is associated with which of these options.

Which of these options is correct. So a discrimination is less predictable from the defaecation improved discrimination at phoneme boundaries,

abrupt change in identification near the phoneme boundary. All of the above, or none of the above.

Okay, so it seems like C is the winner and it's not the right answer for it.

It's D all of the above. So so I guess this is, uh, one of the reasons for using, uh, a quiz like this,

not just to confirm your understanding, but if there is some misunderstanding.

Um. That is great. So, see that your answer is incorrect.

So then you can revise either your understanding. Um, so this is this is true.

Yeah. And I think most of you are getting that, but it's also the case that if you remember that, um, graph near the phoneme boundary, um,

if you have a sound pair that straddles that phoneme boundary and you have to discriminate between those sounds,

you'll be really good at discriminating those sounds.

Whereas if that same pair is to one side, you know, where it's very, uh, clearly above both, you know, both sounds.

Those sounds in the sound pair are perceived very clearly as a bar, for example.

Then it's going to be hard to discriminate them. Okay. So.

Categorical perceptions is also associated with proved improved discrimination at phoney boundaries and.

And it also this is also true as well that um and it's kind of like a kind of like another way of saying this really because,

uh, you know, it seems like discrimination is really linked with identification.

So if your sound pair straddles a phoney boundary and they crossed, you know, they span different, um, phoneme categories,

um, where you have this change in identification that your discrimination is good, otherwise it's not going to be.

So discrimination is linked with identification. And that's another uh characteristic of categorical perception.

All rights. Okay, let's do attendance before I forget.

Okay, so the pin is 8049. Okay.

So one last bit to go through. So can we describe how spoken words are recognise.

Okay. So the cohort model is a it's another old model that's very influential to this day.

So the idea behind the cool model is that when you're processing speech, you've got a sort of internal dictionary.

So you've got a memory for all the words that you know, the memory of what phoneme sequences each word or what phoneme sequence each word involves.

Okay. And so this is the depiction of that. These are just for example these are some of the words, you know obviously there are many more words.

An adult knows about 40,000 words.

So let's think about as speech is coming in. So just say like you've you can present just the beginning of words and then stop and then like trying.

You know, imagine that you could probe inside the listeners minds and see what's happening.

So at this point you've just heard okay. Now, according to the cohort model, you've got activation.

You know, in that spreads amongst all the words that you know that start with could.

And then as the speech unfolds in time. So you've heard now.

Okay, now you've got Kath.

You can see that's what's happening, is that these little nodes here, these words are competing for recognition.

So as the speech is unfolding in time. The words are no longer consistent with the speech that you're hearing.

Drop outs of the competition. And you're just left with, you know, a fewer number of words that match the speech segments.

Until you get to the point where only one word is consistent with the speech that you're hearing.

This points is referred to as the uniqueness points. Uniqueness points is when only one word becomes consistent with the speech inputs.

This is an important point.

According to the cohort model, the words that you're hearing recognise are the of the unique points even before the whole word has been heard.

So it could be that maybe, you know, you think this is unlikely implausible, but you know, you can imagine the alternative,

that you're hearing words and you wait until you've heard the full words, and then you make a decision about what you what you're hearing.

That would be the alternative to this. But this is saying that no speech processing system is using what it can to recognise words as soon as it come.

And so here it already can rule out all the other words.

Listen, I can rule out all the other words that they know, and so they don't need to wait until the whole world has been heard.

So they're going to be ready, recognising that the words at the uniqueness points.

So just to summarise what I've just said. So this is the current model you've got.

Words are activated immediately upon minimal inputs even before the words full words has been heard.

You've got this activation in multiple words. Multiple representations of words.

These representations compete for recognition, so this is termed as lexical competition.

So here's some old evidence. That's a.

It's consistent with a cohort model. So this comes from the what's called the shadowing task.

So in a shadowing task. The participant has to read some texts and, um, read it as quickly as they can.

And the experiment says recording their, their, um.

Uh, she's not reading the text. Sorry. It's reading.

It's repeating back some speech. So they hearing the speech and they're repeating back the speech as quickly as they can.

So they're shadowing the speech so they're hearing it.

And this here, this graph is showing how quickly they can produce the speech relative to the speech that they're hearing.

So this is the response latency. So this is the lag between what the hearing um what their and when they make their response.

And so this is a histogram. So this is showing just skewness collating the data over multiple listeners and multiple trials.

And just looking at where the mean is. Um and it turns out that the average response latency is around 250 milliseconds.

And given that this experiment, the average duration of the words was 375 milliseconds,

this implies that listeners were recognising the words even before they they heard the ends of the words.

Which is very much consistent with the Covid model. It's exactly the idea.

The speech processing system is recognising words based on minimal inputs, even before words have been heard in full.

So that's the core of a model that let's think of the case where you've learned some new words.

So blog at one point in time was in new words when it was introduced.

You can think of other. Words like that, that, you know, there are relatively new words.

And so you can think about what would happen in the speech processing system once you've learned some new words,

how would that affect how you recognise other familiar words that you knew for a long time?

So let's say, for example, you've learned this new words Cathedral.

I want you to tell me. What would happen to your, um.

So actually, let's let's just phrase it exactly as the poll says.

So let's go to this pole. To another pole. You.

Okay, so I want you to tell me, how would learning a new word like cathedral affect your recognition of related existing words like cathedral?

So just to make this clear, so we're looking at how quickly you process cathedral.

But after you've learned this new words Cathedral. Would it speed up recognition or slow down recognition of Cathedral.

Okay, so you're saying slow down recognition B and that's correct.

That's the correct answer. Um, so I'm glad that is intuitive to you.

Just explain and depict this visually. So once you've learned Cathedral, what's going to happen?

Almost. Well, if you've heard these phonemes, this is just the beginning of cathedral up to this vowel here.

Before this was the you need this point. So you didn't have cathedral and it was just a cathedral that was consistent with it.

So that was the uniqueness points at this point in time, 14 points in here.

But now that you've got Cathedral. Cathedral is now also consistent with.

This speech.

And so this is no longer the unique this point unique is point is gonna shift later in time after you've learned this new words Cathedral.

Okay. So after hearing Casey, both these would be activated.

This only until. You got to a later point in time.

Specifically this vowel here that you would have unique activation of cathedral.

So the uniqueness point would shift later in time and so this would slow down recognition.

And indeed this has been shown experimentally. So learning new words slows down recognition of existing words.

And this is another finding that would be consistent with the cohort model.

So you can explain this slowing down of existing word recognition when you know the new words,

by considering that you've got multiple representations of words that are competing for recognition,

and it's only until the uniqueness point that you have a recognition of a word.

Okay. So. Como was very influential still.

But like with all models, as you know, there's always pros and cons.

And this is a this is a called here. This is the shortcoming of the model. So. It's an example of a verbal model which makes it difficult to evaluate.

So there's no you know, there's no. Computer implementation of this model was just, you know, it was described verbally.

It's a better way of making sure what a model theory predicts is to implement as a computer program, so-called computational model.

This is a very important part of cognitive psychology. For this reason, because you have a computational model,

you can really be much more sure what the model is actually saying, what it predicts will happen.

Whereas with a verbal model, you know, it's just a bit fuzzy, isn't it?

When you describe a model words, it's not very specific. And so let me give you an example of a computational model of speech perception.

Another influential one called trace. So this is trace here.

Now trace is a computational model. So it's a neural network model.

You've got these different stages of processing, so you've got this low level acoustic stage.

You've got this intermediate phoneme stage and you've got this high level word stage.

Now if you peek inside this box here, you can see that there's these different units.

So this actually corresponds to this stage here.

They're not showing within this box, but they're sharing just what's happening within this box and within this box.

So within this box, these are the examples of the sort of different neurone like units that are inside this box here.

And so you got a neurone like units that, uh, will respond if there's any inputs,

you've got another one that will respond, will activate if there's a cut and you get the idea.

Um, and then here this word stage, you've got different neurone like units that will respond if the specific words are present in speech.

So like cats, birds and bats. And you can see that this is a hierarchical system.

So you probably learned about that in visual processing.

So you've got this low level stage of acoustic features that um you know say formants and things

like that that would activate the phonemes that are consistent with those acoustic features.

And that in turn will activates words that are consistent with those phonemes.

So if you got the f and in the inputs, then that will activate this units here.

Um, now really important. Part of trace.

Well, first of all, you've got this is what these black arrows mean.

You've got these within layer inhibitory connections. And so this is a mechanism for instantiating lexical competition which I talked about earlier.

So if you have activation of Bads this is going to inhibit activation of cats bats.

So it's going to inhibit other words. So it's going to be full of implements.

As a model. Very concretely what was described in cohorts.

You know, you have this initial observation that was all over the place and that speech was being heard.

You've got this more narrow activation as the words were competing with each other.

And then really key part of this model is that processing doesn't happen just bottom up.

So you don't get processing that just goes in the bottom up direction from low to high levels.

So these are these arrows here. But you've also got processing that goes the other direction.

So if you have an expectation or context which you know, we talked about context effects, if you're expecting green needle,

then those words will then through these top down connections bias or enhance activity of the phonemes that comprise green needle.

So here's a bit of evidence from eye tracking in support the the the trace model.

So this is um, with eye tracking.

What you do is you might see a display like this as a participants, um, and you can see that you've got these different objects here,

like a speaker, a, uh, beaker, beetle, etc. and you'll hear the instruction put the, for example, beaker on the triangle.

And as you're listening to that instruction, you can track people's eye movements and see where they're looking on the screen.

This is really useful because if you think of a behavioural measure involving, say, you know, what word did you hear?

Or how quickly did you hear the words, um.

That is just a very crude response, and it doesn't really allow you to understand what happens.

Um, that led to that final behavioural response, but with eye tracking,

you could see was really good temporal resolution processing as it's happening in real time.

So here's a bit of eye tracking data. So what you're seeing here is fixation probability.

So this is basically how often participants were looking at these different objects here relative to the onset of the critical target words.

So this would be you know put the beaker.

So this is relative to the start of beaker. And uh.

Yes, this is relative to the sort of beaker in this example.

Um, and so you can then see relative to the start of Beaker where they're looking on the screen.

And so you can see that at the beginning of Beaker, this is before you really heard much.

Really there's no participants aren't really looking at any one particular object.

But then as you hear more of the words, so you've started to get the you started to go, sorry.

Um, B you can see them, they start looking at Beaker, but they also are looking at.

Objects that are denoted by words sharing the initial sound.

So, for example, Beatle starts with the same sounds as Beaker.

Then. But as more of the speech unfolds in time, you've heard the full beaker.

Then you can rule out the other words. And, uh, you no longer look at these other words.

Instead, you just really just looking only at a beaker. So this is a way to, in real time, see what's going on in the world recognition system.

And I said before, really nice thing about computational models is that you can really be sure what they predict.

So you can do a simulation with a computational model.

You can present stimuli to the model and look in those neurone like units to see what's happening.

So like in the in the word level, look at the activity in different word units corresponding to beaker beetle.

And then you can see. What's happening in the model and compare that with the with the human listener behaviour.

I hope you can agree that what you're seeing in the model.

Looks very much like what you see in the eye tracking data.

So this is really looks to be really nice evidence up to trace models capturing something

quite you know it's doing a good job of capturing speech processing and word recognition.

So another nice thing about trace is that it has a very nice,

intuitive explanation for explaining the impact of prior knowledge or context on speech perception.

So I told you last the last lecture about the economic effects, showing that if you present ambiguous sounds,

you know, in front of your eyes, then you'll be biased towards words that, you know, like kiss.

So this is showing that information, lexical information, higher level words,

information is influencing um processing of lower level phoneme information.

So this would be Trace's explanation for this sort of effects. So say you've got this sound that's ambiguous between good could.

Well, if you've got contextual information to favour gift.

Well that's going to bias activity towards the good.

Interpretation. Over the coup interpretation.

Whereas if you've got information to favour, um, kiss that you're going to have biasing of activity not in the unit but instead in the cookie bits.

And so what's happening in trace is that information here is feeding back in a top down fashion to influence lower level phoney representations.

But it should be noted that although this is, you know.

Very intuitive explanation for the economic effects. That has been challenged.

And there's other ways to explain context effects like the economic effects without, um, invoking these top down, um, mechanisms.

And, uh, yeah. In fact, this has been very, very. We very, very kind of heated debates, if you can imagine that.

Um, and it's been really, really hard to, uh.

To really get to the bottom of with behavioural measures.

But, uh, I would say that if you looked at the neuroscience evidence, which I'm not going to go into, um,

in today's lecture, but if you look at the neuroscience data, I think it does favour this trace top down.

So model over alternative bottom up models.

Okay. So we've got through all these learning outcomes. So hopefully you now can if you're asked to.

You can discuss the motor theory speech perception. You can describe the neural basis of speech perception.

Talk about the dual streams model specifically.

And then also you can describe how spoken words I recognise by talking about the cohort model, by talking about the trace model.

So these are the key points. To take away.

So not all components of the motor theory are supported by the evidence.

But the idea that speech perception partly involves motor representations is now widely accepted.

And you can see that, for example, in that dual streams model where you've got this dorsal stream that links perception with production,

that's very much a flavour as a flavour of motor theory.

So even though there's some other elements of motor theory that people don't believe in anymore,

there's still an influence in the fields in terms of these more modern theories, like the dual streams model.

The dual streams model.

In addition to the dorsal stream that links perception with production, you've got this ventral stream for doing word recognition.

Another key point to take away as speech unfolds over time.

Words are activated and compete for recognition. The Cohorts and Trace models were designed to explain these processes.

Trace also provides a straightforward explanation for context effects based on this top down feedback mechanism.

But it should be said that, you know, there are other alternative models and it has been a subject of a very long debate.

Okay. So thank you very much as, uh, well, we should leave the room now, but as I said last time,

happy to take questions over email dropping ours discussion thread and canvas.

I'll also make an announcement about a little quiz you can do on canvas.

I thank you very much.