Exhaustive Study Notes on Assessing Pronunciation

Introduction to Pronunciation and the Salience of Accents

Accents represent one of the most perceptually salient components of spoken language.
Research indicates that linguistically untrained listeners can differentiate between native and non-native speakers even under non-optimal conditions, such as: * Speech played backwards (Munro, Derwing, & Burgess, $2010$ ). * Speech in a language the listener does not understand (Major, $2007$ ).
Historical Context: One of the earliest documented language tests is the biblical Shibboleth test (Book of Judges). Warring tribes determined identity based on the pronunciation of "shibboleth" ("sheave of wheat"). Members were evaluated on whether they used a $/f/$ or $/s/$ sound at the syllable onset; a "wrong" pronunciation resulted in fatal consequences (Spolsky, $1995$ ).
Modern Ethical Concerns: Speech analysis by "experts" is used to determine the legitimacy of asylum seekers based on perceived group identity (Fraser, $2009$ ). Such tests are not foolproof and raise concerns regarding fairness, as it is often unclear if the speech signal itself or listener linguistic stereotyping causes unfavorable responses (Kang & Rubin, $2009$ ).

The Shift from Nativeness to the Intelligibility Principle

Foreign accents receive disproportionate attention due to their perceptual salience.
Traditionally, the native speaker was the "gold standard" of language knowledge (Levis, $2005$ ). However, applied linguists now view eradicating foreign accents as an unsuitable goal for L2 instruction for several reasons: * Absolute native-like attainment is unrealistic for most adult L2 learners. * Accent and identity are deeply intertwined (Gatbonton & Trofimovich, $2008$ ). * L2 speakers do not need native accents for social, academic, or professional integration (Derwing & Munro, $2009$ ). * The global spread of English as a lingua franca makes native-speaker norms inappropriate in many EFL settings (Jenkins, $2002$ ).
Consensus View: Oral communication should focus on being understandable (the message) rather than accent reduction. An L2 accent does not necessarily prevent speech from being perfectly understandable; intervention is only necessary when the accent explicitly impedes listener understanding (Derwing & Munro, $2009$ ).

Historical Perspectives and the Problem of "Neglect"

J.R. Firth ( $1957$ ) famously stated: "you shall know a word by the company it keeps."
In L2 research, the term "pronunciation" has frequently kept company with the term "neglect." This stems from communicative proponents devaluing pronunciation as extraneous to communicative competence (Celce-Murcia, Brinton, Goodwin, & Griner, $2010$ ).
Counter-Arguments: Morley ( $1991$ ) argued that intelligible pronunciation is essential to communicative competence and that ignoring it is an "abrogation of professional responsibility" because poor pronunciation is disadvantageous socially and professionally.
Fossilization: Adult L2 learners with "fossilized" pronunciation can still benefit from explicit instruction, which can be embedded in communicative activities (Trofimovich & Gatbonton, $2006$ ).

Robert Lado and the Foundations of Pronunciation Assessment

Pronunciation assessment has been marginalized since Robert Lado’s seminal book, Language Testing ( $1961$ ).
Lado’s Treatment: He provided separate chapters on testing perception and production of individual sounds, stress, and intonation.
Outdated Concepts: Lado viewed language as a "system of habits" and believed problems occurred only where differences existed between the L1 and the target language phonetic inventories.
Modern Nuance: Accurate perception and production of L2 segments (vowels/consonants) are mediated by the learner’s perception of how different a sound is from their existing L1 categories (Flege, Schirru, & MacKay, $2003$ ). Factors like phonetic environment and lexical frequency also contribute to performance.

The Validity Crisis of Written Pronunciation Tests

Lado proposed objective paper-and-pencil tests (e.g., multiple-choice) to scale testing for large numbers of students.
The National Centre Test in Japan (standardized university admission test) still uses these items: * Segmental items: Selecting a word with a different sound (e.g., "boot", "goose", "proof", "wool"; the sound $/U/$ in "wool" differs from $/u/$ ). * Word stress items: Selecting words with identical primary stress patterns (e.g., "fortunately" and "elevator" both have stress on the first syllable).
Empirical Refutation: Buck ( $1989$ ) found that these tests do not work: * Internal consistency coefficients ( $KR-20$ ) were unacceptably low ($-.89 $to$ .54). * Correlation between written items and oral production was low (.25 $to$ .50). * Correlation with extemporaneous speech was even lower (.18 $to$ .43 $).</li><li>There is no empirical evidence supporting written tests as a valid measure of oral pronunciation.</li></ul><h3 id="81912cfd-ded4-4922-880c-ffe8ae7019b7" data-toc-id="81912cfd-ded4-4922-880c-ffe8ae7019b7" collapsed="false" seolevelmigrated="true">Theoretical Frameworks and Definitions: Accentedness vs. Comprehensibility</h3><ul><li>Bachman’s ($ 1990 $) Framework: Included "phonology/graphology" but paired them oddly (pronunciation vs. handwriting legibility). Bachman and Palmer ($ 1982 $) actually omitted this variable, viewing it as a "channel" rather than a component because communication only breaks down below a critical level.</li><li>Levis’ ($ 2005) Two Principles: * Nativeness Principle: Focuses on reducing L1 traces. Aligns with "Accentedness" (perceived difference from native norms). * Intelligibility Principle: Focuses on being understandable. Endorsed by most researchers.
Narrow Definitions (Derwing & Munro, 1997): * Intelligibility: The actual amount of speech understood (measured by orthographic transcription accuracy). * Comprehensibility: The perceived ease of understanding (measured on a rating scale). * Distinction: The difference between them is operational (instrumentation) rather than purely theoretical.

Shortcomings of Current Proficiency Scales (IELTS, TOEFL, ACTFL, CEFR)

Omisison: CEFR excluded pronunciation descriptors due to high misfit values (North, 2000 $).</li><li>Inconsistency: ACTFL Oral Proficiency Guidelines mention pronunciation at levels$ 1 $,$ 3 $,$ 4 $, and$ 5 $, but skip it at level$ 2 (novice mid).
Vagueness: * IELTS (Band 4): Mentions "limited range" and "frequent lapses" leading to "some difficulty" without specifying error types. * TOEFL iBT (Level 2): Mentions "problems with pronunciation, intonation, or pacing" requiring "significant listener effort." * Terminology: In IELTS, "pronunciation" is used holistically; in TOEFL iBT, its juxtaposition with "intonation" implies it refers only to segmentals.
Conflation: Cambridge ESOL Common Scale for Speaking and Morley’s Speech Intelligibility Index equate "easily understood" with "native-like" or "nonexistent accent," ignoring evidence that accented speech can be perfectly comprehensible.

Linguistic Factors Influencing Understanding

International Teaching Assistants (ITAs): Pronunciation is often a scapegoat for broader communication barriers like acculturation issues or listener discrimination (Kang & Rubin, 2009 $).</li><li>Lingua Franca Core ($ Jenkins, 2002): Proposed a core set of features for global English, though critics note the database is small and inclusion criteria are unclear.
Suprasegmentals vs. Segmentals: * Prosodic features (stress and timing) have a direct effect on intelligibility (Hahn, 2004). * Segmental errors are less detrimental unless they have a high "functional load" (the frequency of the contrast in distinguishing words) (Munro & Derwing, 2006). * Accuracy in perception, production, and orthographic influence (sound-symbol correspondence) are all critical diagnosic areas.

The Isaacs and Trofimovich (2012 $) "Deconstruction" Study</h3><ul><li>Participants:$ 40 $Francophone learners of English.</li><li>Task: Picture narrative task.</li><li>Measures:$ 19 speech measures (segmental, suprasegmental, temporal, lexicogrammatical, and discourse-level).
Techniques: Auditory (listener perception of pitch patterns) and Instrumental (Praat software pitch tracking).
Findings on Comprehensibility Levels: * Low-level learners: Differentiated by lexical richness and fluency. * High-level learners: Differentiated by grammatical and discourse measures. * All levels: Differentiated by word stress.

Rater Characteristics and Bias

Cognitive Variables: Isaacs and Trofimovich (2010 $,$ 2011) tested phonological memory, attention control, and musical aptitude. * Phonological memory and attention control showed no significant effect (reassuring for validity). * Musical aptitude: Musical raters were more severe in judgments and more sensitive to pitch (melodic) phenomena.
Sociolinguistic Variables: Native speakers' perceptions are mediated by attitudes toward the partner's L1 (Lindemann, 2002).
Familiarity: Rater familiarity with an accent sometimes leads to higher scores, though research findings are inconsistent due to varying definitions of "familiarity" (Carey, Mannell, & Dunn, 2011).

Automated Scoring and Technological Innovations

Mechanism: Speech recognition algorithms trained on pooled human ratings to average out individual idiosyncrasies (Pearson’s Versant English Test/Phonepass).
Validity vs. Reliability: Automated scores correlate highly with human ratings (Bernstein, Van Moere, & Cheng, 2010) and tests like TOEFL/IELTS.
Limitations: * Machines do not attend to the same properties as humans; humans often perceive stressed syllables as higher than spectral analysis reveals (Crystal, 2008$$). * Automated systems struggle with spontaneous, communicative speech and prefer constrained tasks (sentence unscrambling). * Machines focus on pronunciation accuracy (vowels/consonants) rather than the communicative impact of errors.

Summary of Essential Challenges and Future Directions

The Role of Pronunciation: It must be unparsed in theoretical models of communicative competence.
Task Design: Research should move from monologic tasks (speaking into a microphone) to collaborative/paired interactional tasks involving both native and non-native interlocutors.
Construct Definition: Intelligibility must be filtered from accentedness in proficiency scales.
Practitioner Support: Teachers need precise information on error types that cause communication breakdown.
Reinvigoration: The field needs to reject the view that pronunciation is incidental and move past mechanical drills into communicatively oriented assessment.