speech and music perception
speech perception is easily the most important form of auditory perception.
important forms of auditory perception not involving words include music perception and identifying the nature and sources of environmental sounds
the relationship between speech perception and auditory perception is controversial
humans may have special speech-perception mechanisms: the ‘speech is special’ approach
alternatively the same general mechanisms may process speech and non-speech sounds
Brandt et al claimed controversially that we can ‘describe language as a special type of music’
support for this includes the fact that music and language perception both involve the goal of ‘grouping acoustic features together to form meaningful objects and streams’
second support is if you listen repeatedly to the same looped recording of speech, it often starts to sound like singing when you stop attending to its meaning. Brain areas associated with music perception were more activated by repeated speech perceived as song than repeated speech not perceived as song.
Tierney also studies the speech-to-song illusion, Ratings of the musicality of spoken phrases increased when these phrases were repeated and listeners became more responsive to the musical structure
Further evidence on the relationship between speech and music perception:
categorical perception
categorical perception is when speech stimuli/ when a sound intermediates between two phonemes but is categorised as one of those phonemes.
is categorical perception unique to speech perception?
Raizada and Poldrack presented listeners with two auditory stimuli and asked them to decide whether they represented the same phoneme. There was evidence of categorical perception. The differences in brain activation associated with the two stimuli were amplified when they were on opposite sides of the boundary between the two phonemes.
There is often only limited evidence for categorical perception with speech sounds. It is less evident with vowels than consonants and listeners are often sensitive to variations within a given perceptual category
Bidelman and Walker reviewed findings indicating categorical perception is also present in music. However, it is stronger for speech than music (especially among non-musician listeners). These findings suggest categorical perception occurs mostly with familiar stimuli.
Weidma presented various pitch contours embedded in linguistic or melodic phrases. There was evidence of categorical perception in both the language and music contexts, However, identical pitch contours were categorised differently depending on whether they were perceived as language or music. Thus, there are both similarities and differences in categorical perception in speech and music
Do music and speech perception involve the same brain areas?
the relationship between music and speech perception can be studied by comparing the brain areas activated with each form of perception.
some neuroimaging research has reported mostly non-overlapping brain regions are involved in music and speech perception.
however, Sleve and Okada argued that this is NOT the case when relatively complex tasks are used. They found complex music and speech perception both involved cognitive control (using the prefrontal cortex areas), which is used to detect and resolve conflict that occur when expectations are violated and interpretations must be revised
Lacroix conducted a meta-analytic review. Passive music and speech listening were both associated with activation in large areas of the superior temporal gyrus. However, the precise areas differed between music and speech perception. In addition, Broca’s area was more activated during speech perception than music perception. Lacroix concluded ‘our findings of spatially distinct regions for music and speech clearly suggest the recruitment of distinct brain networks for speech and music’
research on brain-damaged patients has also revealed important differences between speech and music perception. Some patients have intact speech perception but impaired music perception whereas others have intact music perception but impaired speech perception.
in conclusion: there are important similarities between music and speech perception, however, they differ with respect to underlying brain areas and cognitive processes.
Processing stages
main processes involved in speech:
initially, listeners often have to select out the speech signal of interest from several other irrelevant auditory inputs
after that, decoding involves extracting discrete elements (like phonemes or other basic speech sounds) from the speech signal.
there is a controversy as to whether decoding involves identifying phonemes (small unit of sound) or syllables (speech units based on a vowel sound often plus one or more consonants)
there is an important distinction between phonemes and allophones (variant forms of any given phoneme). like the word pit and spit, they both have the phoneme ‘p’, but there are slightly differences in the way is pronounced in the two words. Thus there are two allophones relating to ‘p’ but only one phoneme and so allophones are context-dependent whereas phonemes are context-independent
there has been controversy as to whether phonemes or allophones are the basic units in spoken word recognition.
the main processes involved in speech perception and comprehension
auditory input
select speech from acoustic background and transform to abstract representation
listeners have to select out the speech signal of interest from several other irrelevant auditory inputs
word recognition:
activation of lexical candidates
competition
retrieval of lexical information
various problems in word identification include that all English words are formed from only about 35 phonemes and as a result most spoken words resemble each other at the phonemic level, so they’re hard to distinguish. This becomes easier if listeners make use of allophones rather than phonemes
utterance interpretation
syntactic analysis
thematic processing
This stage involves constructing a coherent meaning for each sentence based on information about individual words and their order within the sentence
integration into discourse model
this and the previous stage emphasise speech comprehension
this stage involves integrating the meaning of the current sentence with preceding speech to construct an overall model of the speaker’s message
listening to speech
understanding speech can be difficult because..
speech perception depends on several aspects of the speech signal
it depends on whether speech is heard under optimal or adverse conditions. Mattyse et al defined an adverse condition as ‘any factor leading to a decrease in speech intelligibility on a given task relative to the level of intelligibility when the same task is performed in optimal listening conditions’
mattyse identified two major types of adverse conditions:
energetic masking: distracting sounds cause the intelligibility of target words to be degraded, this masking mostly affects bottom-up processing and is a serious problem in everyday life
informational masking: cognitive load makes speech perception harder. informational masking mainly affects top-down processing
Alain et al found listeners use different processes depending on why speech perception is difficult. They conducted a meta-analysis of three types of studies: speech in noise; degraded speech; and complexity of the linguistic input. Their finding was that patterns of brain activation varied across these three types of studies
problems with the speech signal
segmentation, which involves separating out or distinguishing phonemes (unit of sound) and words from the pattern of speech sounds. most speech has few periods of silence, as you have probably noticed when listening to a person speaking in an unfamiliar language. This makes it hard to decide when one word ends and another begins
coarticulation: a speaker’s pronunciation of a phoneme depends on the preceding and following phonemes. This is problematic because it increases the variability of the speech signal, however, it can provide a useful cue because it allows listeners to predict the next phoneme to some extent
speakers differ in several ways, like dialect and speaking rate, and yet we generally cope well with such variability. Kreingwatana trained dutch and Australian-English listeners to discriminate two dutch vowels from a single speaker. Both groups successfully categorised the same vowels when spoken by a speaker of the opposite sex, however, both groups performed poorly and required feedback when the vowels were spoken by someone with a different accent. Thus adapting to a diff-sex speaker is relatively ‘automatic’ but adapting to a diff accent requires active processing of additional information
expectations are important, some listeners expected to hear two speakers with similar voices whereas others expected to hear only one speaker. Those expecting two speakers showed worse listening performance
language is spoken at 10 phonemes (basic speech sound) per second and much acoustic info is lost within 50 ms. As a consequence, ‘if linguistic info is not processed rapidly, that info is permanently lost’
non-native speakers often produce speech errors, listeners cope by using top-down processes to infer what non-native speakers are trying to say
coping with listening problems
segmentation:
dividing the speech signal into its constituent words is crucial for listeners. Segmentation involves using several cues, some are acoustic-phonetic, whereas others depend on the listener’s knowledge and the immediate context.
segmentation is influences by constraints on what words are possible. Listeners found it hard to identify the word apple in fapple because fapple could not possibly be an English word
evidence indicating segmentation can be based on possible word constraints has been obtained in several languages, however, it does not apply to Russian, a language which has some single-consonant words lacking a vowel
stress is an important acoustic cue. In english, the initial syllable of most content words (like nouns and verbs) is typically stressed. Strings of words without the stress on the first syllable are misperceived
speaker variability
Cai proposed a model to explain how listeners cope with variability. They assumed listeners use info provided by the speech signal to infer characteristics of the speaker and this influences how speech is perceived. Cai tested their model using the same words that have diff meanings when heard in an American or English accent; like bonnet, America = hair protection England = front of the car. As predicted British listeners were more likely to interpret such words as having the American meaning when spoken in an American rather than British accent.
the crucial condition involved presenting these words in a neutral accent. These words were presented in a context of other words spoken in an American or British accent. As predicted, the neutral words were more likely to be interpreted in their American meaning when the context consisted or words spoken in an American accent. Thus, listeners’ speaker model biased their interpretations.
McGurk effect
listeners often make extensive use of lip-reading when listening to speech. McGurk showed the McGurk effect, when they prepared a videotape or someone saying ‘ba; repeatedly, then the sound channel changed so there was a voice saying ‘ga’ repeatedly in synchronisation with lip movements still indicating ‘ba’. Listeners reported hearing ‘da’, a blending of the visual and auditory information
on average, the McGurk effect is strongest when the auditory input lags 100ms behind the visual input. This probably happens because lip movements can be used predictively to anticipate the next sound to be produced. Listener show the effect even when they were aware of a temporal mismatch between the visual and auditory input (one started before the other)
The McGurk effect was stronger when the crucial word formed by blending auditory and visual input was presented in a semantically congruent sentence
hierarchical approach to speech segmentation
three levels/ tiers
tier 1: we prefer to use lexical cues
tier 2: when lexical information is impoverished, we use segmental cues such as coarticulation and allophony (one phoneme may be associated with two or more similar sounds or allophones)
tier 3: we resort to metrical prosody cues when it is hoard to use Tier 1 or 2 cues. One reason we often avoid using stress cues is because stress information can be misleading when a word’s initial syllable is not stressed
Mattys found that coarticulation (tier 2) was more useful than stress (tier 3) for identifying word boundaries when the speech signal was intact. In contrast when the speech signal was impoverished and made it hard to use tier 1 or 2 cues, stress was more useful than coarticulation
Context effects
it is indisputable that context typically influences spoken word recognition, however, it is hard to clarify when and how context exerts its influence.
Harley identified two extreme positions
according to the interactionist account, contextual information influences processing at an early stage and may influence word perception.
in contrast, the autonomous account claims context has its effect late in processing. According to this account: ‘context cannot have an effect prior to word recognition, it can only contribute to the evaluation and integration of the output of lexical processing. not its generation’
phonemic restoration effect
Warren and Warren obtained strong evidence that sentence context can influence phoneme perception in the phonemic restoration effect. Listeners heard a sentence with a missing phoneme that had been replaced with a meanignless sound (cough). The sentences used were as follows
it was found that the *eel was on the shoe, or table, or axle, or on the orange
the perception of the crucial element in the sentence was influenced by the sentence in which it appeared, Participants listening to the first sentence heard wheel, those listening to the second sentence heard heel etc, the crucial auditory stimulus was always the same so all that differed was the contextual information.
what causes the phonemic restoration effect
there may be a fairly direct effect on speech processing, with the missing phoneme being processed almost as if it were present. Alternatively, there may be an indirect effect with listeners guessing the identity of the missing phoneme after basic speech processing has occurred.
Ganong effect
the finding that perception of an ambiguous phoneme is biased towards a sound that produces a word rather than a non-word
In order to understand the processes underlying the ganong effect is important to ascertain when lexical (word-based) processing influences phonemic processing.
Kingston et al obtained clear evidence on the issue, listeners categorised phonemes by choosing between two visually presented options (one completing a word and the other not). listeners directed their eye movements to the word-completing option almost immediately. This finding strongly suggests there is a remarkably rapid merging of phonemic and lexical processing. This seems inconsistent with the notion that phonemic processing is completed priori to the use of word-based processing
theories of speech
orthographic influences
chiarello et al (2018) studied spoken word identification under difficult conditions (multi-speaker babble). The researchers computed the proportion of similar sounding words (phonological neighbours) also spelled similarly (orthographic neighbours) for each spoken word. Word identification rates were lower for words having many orthographic neighbours as well as phonological neighbours. Thus word identification was influenced by orthography.
How does orthography influence speech perception? hearing a word leads fairly ‘automatically’ to activation of its orthographic codes and so influences lexical access. Alternatively, a spoken word’s orthography may influence its processing only after lexical access. This issue has been addressed using ERPs,
Pattamadilock et al asked listeners to decide whether spoken words had a given final syllable. Orthographic info influenced ERPs at 175-250 ms, suggesting orthography affects early processing prior to lexical access.
similarly, with the Korean language, that orthographic information influenced the P200 component of the ERP on a spoken word recognition task.
Motor theory
Liberman proposed listeners mimic the speaker’s articulatory movements. It was claimed that this motor signal provides much less variable and inconsistent info about the speaker’s words than does the speech signal and so facilitates speech perception
much research has assumed there is a single motor speech system, which is a drastic oversimplification
listeners sometimes make more use of speech-production processes when the speech input is unclear and provides insufficient auditory information, For example, Nuttall et al found listeners had greater activation in the motor context when speech perception was made harder, however, they used rather artificial speech stimuli.
evidence from brain damaged patients might clarify the role of motor processes in speech perception. If patients whose motor cortex is destroyed can still perceive speech, we might conclude motor processes are unnecessary for speech production, but this is too simplistic. ‘speech perception deteriorates with a wide range of damage to speech-production systems caused by stroke, focal excitation for epilepsy, cerebral palsy and Parkinsons’
the context effect should be much greater when the target is a word as (stated by Uddin et al) ‘it is not possible to make neural predictions via motor systems for environmental sounds that do not have clear speech representations, however, the context effect was as great with the environmental sounds as the words. Thus, listeners make predictions at the level of conceptual meaning.
what are the limitations of motor theories?
Uddin et al’s findings suggest listeners do not simply predict the sounds that will be presented, instead, most theories of speech perception should be modified to include a larger contribution from general cognitive processes that take conceptual meaning into account
this available evidence suggests ‘multiple speech productions-related networks and sub-networks dynamically self-organise to constrain interpretations of indeterminate acoustic patterns as listening context requires’ no theory explains these compelxities
many brain areas are involved in speech perception but not speech production, thus motor theories would need development to provide copmrehensive accounts of speech perception.
when speech input is clear, comprehension can be achieved with minimal involvement of speech-production processes. That may limit the applicability of motor theories to speech perception in typical conditions
TRACE model
assumptions the TRACE model makes
there are individual processing units or nodes at three different levels: features (voicing; manner of production): phonemes: and words
Feature nodes are connected to phoneme nodes, and phoneme nodes are connected to word nodes.
Connections between levels operate in both directions and are always facilitatory .
There are connections among units or nodes at the same level; these connections are inhibitory .
Nodes influence each other in proportion to their activation levels and the strengths of their interconnections.
As excitation and inhibition spread among nodes, a pattern of activation or trace develops.
All activated words are involved in a competitive process in which these words inhibit each other. The word with the strongest activation wins the competition.
“Words are recognised incrementally by slowly ramping up the activation of the correct words at the phoneme and word levels”
the TRACE model assumes that bottom-up processing and top-down processing interact. Bottom-up activation proceeds upwards from the feature level to the phoneme level and on to the word level. In contrast, top-down activation proceeds in the opposite direction from the word level to the phoneme level and on to the feature level
How does the TRACE model explain categorical speech perception?
The TRACE model explains categorical speech perception by assuming the boundary between phonemes becomes sharper because of mutual inhibition between phoneme units. These inhibitory processes produce a “winner takes all” situation with one phoneme becoming increasingly more activated than other phonemes, thus producing categorical perception. High-frequency words (those encountered frequently) are generally recognised faster than low-frequency ones (Harley, 2013). It would be consistent with the TRACE model’s approach to assume this finding occurs because high-frequency words have higher resting activation levels. If so, word frequency should influence even early stages of word processing.
Dufour et al. (2013) obtained supporting evidence. Word frequency influenced event-related potentials as early as 350 ms after word onset during spoken word recognition. We turn now to problematical findings for the model. It assumes top-down influences originate at the word level. Thus, top-down effects (e.g., produced by relevant context) should bene t target identification more when the target is a word (e.g., sheep ) rather than an environmental sound (e.g., a sheep bleating). However, context effects are as great with environmental sounds as with words, suggesting top-down processing activates general conceptual meanings rather than specific words.
cohort model
The cohort model focuses on the processes involved during spoken word recognition. It differs from the TRACE model in focusing more on bottom-up processes and less on top-down ones. Several versions have been proposed, starting with Marslen-Wilson and Tyler (1980). Here are the main assumptions of the original version:
Early in the auditory presentation of a word, all words conforming to the sound sequence heard so far become active: this is the word-initial cohort. There is competition among these words to be selected.
Words within the cohort are eliminated if they cease to match further information from the presented word or because they are inconsistent with the semantic or other context. For example, crocodile and crockery might both belong to the initial cohort with the latter word being excluded when the sound /d/ is heard.
Processing continues until information from the word itself and contextual information permit elimination of all but one of the cohort words. The uniqueness point is the point at which only one word is consistent with the acoustic signal.
three stages identified within the cohort model
access stage during which a word cohort is activated
selection stage during which one word is chosen from the cohort
integration stage during which the word’s semantic and syntactic (grammatical) properties are integrated within the sentence
Gaskell and Marslen-Wilson (2002) proposed another variant of the cohort model. Its central assumption was that there is “continuous integration” of information from the speech input and context. If the speech input is degraded or the context is strongly predictive, top-down processes relating to prediction of the next word are likely to dominate within this continuous integration. In contrast, bottom-up processes triggered by the speech signal are dominant within continuous integration if the speech signal is unambiguous and there is no constraining context.
Lecture:
when based on sensory input to semantic understanding is bottom up processing
when based on semantic understanding down to sensory input, this is called top down processing
accessing the mental lexicon
develop and compare the relevant phonological words, syntax, semantics and orthographic
challenges to lexical access
we have a continuous speech stream, not a pause between each word, the words flow into a continuous production of sound
homonyms; words that sound the same with completely different meaning
homophomes, different spellings and meanings but the same sound (aisle vs isle)
coarticulation; the wat in which we produce sound requires dexterity, depending on the context and following word/phoneme, a phoneme will sound different than said after a different phoneme/word
different accents
invariance problem- problems of definition of acoustic properties; phonemes, syllable, words
evaluation- cohort model
The cohort model has several strengths.
First, the assumption that accurate perception of a spoken word is typically accompanied by some processing of several competitor words is generally correct.
Second, the processing of spoken words is sequential and changes considerably during the course of their presentation.
Third, the uniqueness point is of great importance in spoken word recognition.
Fourth, context effects often (but not always) occur during the integration stage following word identification as predicted by the model.
Fifth, the revised versions of the model are superior to the original version. For example, the assumption that membership of the word cohort is a matter of degree rather than all-or-none is more in line with the evidence.
What are the model’s limitations?
First, context sometimes influences word processing earlier than the integration stage. This is especially the case when the context is strongly predictive or the speech input is degraded. However, Gaskell and Marslen-Wilson’s (2002) more exible approach based on continuous integration can accommodate these (and many other) ndings.
Second, the revised cohort model de-emphasises the role of word meaning in spoken word recognition. One aspect of word meaning is imageability (ease of forming an image of a word’s referent). When there are many words in the word cohort, high-imageability words are easier to recognise than low-imageability ones and they are associated with greater activation in brain areas involved in speech perception. Thus, word selection depends on semantic factors as well as phonological ones.
Third, mechanisms involved in spoken word recognition may di er from those emphasised within the model. More specifically, predictive coding and enhanced processing of speech features inconsistent with prediction may be more important than assumed within the cohort model.
cognitive neuropsychology
theoretical framework proposed by Ellis and Young; there are five components
the auditory system extracts phonemes or other sounds from the speech wave
the auditory input lexicon contains info about spoken words known to the listener but not about their meaning
word meanings are stored in the semantic system
the speech output lexicon provides the spoken form of word
the phoneme response buffer provides distinctive speech sounds
the framework’s most striking assumption is that three different routes can be used when saying spoken words.
auditory analysis system
Ma ei et al. (2017) studied a female patient (FO) with pure word deafness. She had a selective impairment in auditory language processing but intact processing of environmental sounds and music (e.g., identifying which musical instrument was being played). She also had intact speech, reading and writing. Unsurprisingly, FO had damage to regions of a brain network dedicated to speech sound processing. Slevc et al. (2011) argued that speech perception differs from the perception of most non-speech sounds because listeners must cope with rapid stimulus changes. They found NL, a patient with pure word deafness, had great difficulties discriminating sounds (speech or non-speech) differing in rapid temporal changes. Thus, the rapid stimulus changes in spoken words may partially explain why patients with pure word deafness have severe speech-perception problems.
Three-route framework:
Ellis and Young’s (1988) framework specifies three routes that can be used when individuals process and repeat words they have just heard (see Figure 9.12). All three routes involve the auditory analysis system and the phonemic response bu er. Route 1 also involves the other three components (auditory input lexicon; semantic system; speech output lexicon). Route 2 involves two additional components (auditory input lexicon; speech output lexicon), and Route 3 involves an additional rule-based system converting acoustic information into words that can be spoken. According to the three-route framework, Routes 1 and 2 are used with unfamiliar words and non-words
BB, a female patient with word meaning deafness, could distinguish between words and non-words. She was severely impaired in identifying pictures matching spoken words but not when identifying pictures matching written words (Bormann & Weiller, 2012). Thus, BB could not access the meanings of spoken words although her semantic processing ability was intact. Patients using only Route 3 could repeat spoken words and non-words but would have very little comprehension of the words. Patients with transcortical sensory aphasia exhibit this pattern. For example, Kim et al. (2009) studied a male patient. He repeated spoken words but had severely impaired auditory and reading comprehension. These ndings suggested he had damage within the semantic system. Kwon et al. (2017) studied two patients with transcortical sensory aphasia. Their impaired auditory comprehension appeared to be due to greatly decreased functional connectivity between language centres in the brain.
distinguishing the speech stream (level 1 content); lexical access is based on:
categorical perception
ability to distinguish between sounds on a continuum based on voice onset time
Va vs Fa
certain syllables or noise we make is voiced and some are unvoiced, if saying va the voicing begins early, but Fa begins rather later
perceptual learning
adjust categorical perception based on sounds we hear
we pick up on sounds important to us and we adjust to do so
top-down processing
drawing on our knowledge on what should be said or heard, we used this to fill in the gaps, like if someone were to forget a word or cough during a word
spreading activation
helps predict a sound that might be coming up via activation of items related to the acoustic input
lexical characteristics affect speed of lexical access
word length; long words, slower to process
neighbourhood density; lots of neighbours- processed more slowly
frequency; the more frequently word is accessed in lexicon, the quicker you can access it
models of speech comprehension
Marslen-Wilson; The cohort model:
predicts that we access words in the lexicon via activation of all words that share the initial features and gradually deactivated words that stop matching the features until there’s one word left that matches, this is what the person had said (using phonological knowledge)
Elman & McCelland; the Trace model:
predicts that features activate phonemes that activate words with a gradual increase in activation of words that match all features so that the word with the most activation wins
theories of speech perception
the Cohort Model
lexical activation; all words that start with the same initial sound/phoneme said are activated in our brains
lexical activation of the ‘cohort’ that matches the input
as the sound/word gets longer, we deactivate/ deselect candidate words that do not match the acoustic input now
there is a uniqueness point now, which idenitifes the accurate acoustic word
neighbourhood effects of this:
neighbour compete with each other for recognition
learning ‘aprikol’ for example, slows down the recognition of the word ‘apricot’
frequency effects
words with high frequency have high resting states- less activation required to recognise high frequency words
evidence in favour of a ‘cohort’
Warren-marslen-wilson
participants are presented with fragments of words that gradually reveal the whole word and asked to guess what the word is after each presentation
gating paradigm: activation of multiple different words that sound the same as the acoustic input is being said
this suggests that recognition of a word is a gradual process that starts from word onset and continues until the end of the word
candidate words that no longer fit the acoustic input are eliminated
Cohort model is dependent on the bottom-up processing
facilitatory signals are sent to words that match the speech input
inhibitory signals are sent to words that do not match the speech input
bottom-up processing has priority
a constraint of the cohort model
y
cohort model 3 stages to word recognition
access- acoustic-phonetic information is mapped onto lexical items
selection- candidate words that mismatch the acoustic input are de-selected- candidate word is chosen
integration- semantic and syntactic properties of the word are integrated and checked against the sentence, integration is affected by sentence context, early iterations of the model suggested context constrained the cohort
priming paradigm:
activating semantically related concepts so they become more active
for example, if you were presented with the word doctor (the prime), spreading activation allows ‘nurse’ (the target) to become active when ‘doctor’ is present
cross modal priming; the same situation but the prime word is auditory, where as the target word is visual
this should still work if saying the first part of the prime word; in related prime-target pair
the impact of context
biassing: can we use context to get one word activated over another? No, it still activates either viable option, both words receive the same amount of activation, the only time when there is a difference is when the whole word is shown
items that match acoustic input but do not match sentence context are activated, items that match acoustic input but do not match sentence context are deactivated once the word is selected
revised cohort model 1994
context influences selection/integration of word into sentence- the word with semantic activation that fits the context of the sentence will be integrated into the sentence
the mean had served for many years under their /cap/- semantic representation of captain is a better fit to the sentence than the semantic representation of capital and helps to single out ‘captain’ as the appropriate word
TRACE model:
in TRACE words are recognised ‘incrementally’ by slowly ramping up the activation of the correct units at the phoneme and word levels
rather than activating a massive cohort then reduce, we gradually add ones that match the sounds being heard
lexical competitive inhibition, they are inhibited from receiving more information, the one with most is put forward
implemented computational model based on connectionist principles
processing units (node) correspond to mental representations of; features (voicing, manner of production), phonemes, words
TRACE model facilitates for a much bigger version of bottom-up processing, each level is connected via facilitatory connections, activation spreads up from features to lexical items
with each level the connections between the nodes within each level are inhibitory, facilitatory connections between levels also travel down from the lexical level to the phoneme level and the feature level
top down processing increases activation of phonemes and features, whereas with bottom up processing activated words inhibits competitors, features activate relevant phonemes, activated phonemes inhibits competitors, activated phonemes activate words, activated word inhibits competitors
radical activation model: ‘any consistency between input and representation may result in some degree of activation’
nodes influence each other according to their activation levels & strengths of connections
activation develops as a pattern of excitation from facilitation and inhibition
candidate words are activated based on the pattern of activation
bottom up and top down processes
bottom up- activation from feature to word level
top down-activation from word to feature level
evidence for the TRACE model: activation of words in the lexicon
allopenna et al: using an eye tracking study demonstrated that words with overlapping phonology that do not start with the same onset as the speech input (rhyme competitors) are activated in speech perception
visual world paradigm: allopenna et al; pps are presented with a grid that contains images of items with a beaker, a beetle, a speaker and a pram, pps are asked to ‘click on the beaker and place it under the triangle’, participants eye movements are monitored whilst they complete the task, if the words related to beaker are active in the lexicon pps will look towards those items, findings- pps looked at the Beaker, the Beetle and the speaker, but most actively looked towards the beaker (theyre moving it from one to another so makes sense), there is also more activation for beetle (for the first 400ms after the word was heard), carriage had no activation, but later in the process there is activation for speaker (between 400ms-600ms after the word was heard), the rhyming competitor has become active, which goes against the cohort model, showing evidence that the later elements are still being activated
evidence for the TRACE model: activation of words in the lexicon
are words activated based on shared word initial sounds?
the evidence from allopenna et al and others suggests that words that rhyme with sounds in any part of a word may become activated. The initial cohort of words activated in response to the speech stream is not limited to words with the same onset
facilitatory links between words and phonemes should result in more accurate detection of phonemes in words compared to non-words. PPs asked to detect a /t/ or /k/ in words and nonwords should find it easier to identify the /t/ in heighten compared to vinten, so with real words rather than made up words. Demonstrates the effect of top-down processing
Research has also found evidence that questions the superiority of top down effects. pps were able to accurately detect phonemes in non-words that were word like, pps failed to complete ambiguous phonemes with a phoneme that would create a word unless stimuli were degraded
TRACE model vs Cohort Model
the TRACE model emphasised top down processing, where as the Cohort model predicts minimised the impact of top-down processing
the COHORT model predicts that lexical access is biased towards activation of words with shared onsets, where as the TRACE model accommodates the activate of rhyming competitors
the TRACE model doe snot provide an account of how context might affect speech or perception
the evidence also suggests that there is a tendency to activate words that start with the same sounds