Common Examiner Scoring Errors on Academic Achievement Measures

Abstract

  • The study explores scoring errors across three well-known achievement tests:

    • Kaufman Test of Educational Achievement–Second Edition (KTEA-2)

    • Woodcock–Johnson Tests of Achievement–Third Edition (WJ-III)

    • Wechsler Individual Achievement Test–Third Edition (WIAT-III)

  • Sample size: 114 protocols evaluated.

  • Focus: Frequency and types of scoring errors made by novice examiners.

  • WIAT-III had the most scoring elements, making it the most vulnerable to errors.

  • More errors were found in composites requiring greater examiner inference and interpretation.

  • Findings discuss implications for assessment fidelity and training practices.

Keywords

  • Assessment fidelity

  • Level B assessment

  • Achievement tests

  • Scoring errors

Introduction

  • Assessment Fidelity: A critical element of effective academic intervention, crucial for response-to-intervention (RTI) approaches.

  • Previous research:

    • Limited focus on examiner errors in achievement assessments; most studies examined cognitive measures.

    • Highlighted the need for trained examiners and continual improvement in scoring practices for both novice and experienced examiners.

  • Shift in identification of Specific Learning Disorders (SLDs) towards specific academic skill deficits.

Types of Scoring Errors

  • Categories of Scoring Errors:

    • Administration errors

    • Scoring errors

    • Clerical errors

  • Common errors reported in cognitive assessments indicate a substantial likelihood of misclassification of functioning based on testing results.

Sources of Errors

  • Errors occur due to:

    • Multitasking during examination (recording responses, maintaining rapport, following instructions)

    • Inaccurate recording techniques, miscalculation of scores, and subjective judgment errors in scoring.

  • Higher error frequency observed in subtests requiring greater inference, such as comprehension and vocabulary in cognitive measures.

Study Objectives

  • To provide preliminary data on scoring errors evident on three achievement measures (WIAT-III, KTEA-2, WJ-III).

  • To investigate differences in error frequency between measures.

  • To analyze error-proneness within specific subtests (e.g., written expression vs. mathematics).

Methodology

Participants

  • 114 certified teachers from a mid-sized Canadian university.

  • 87 female, 27 male participants enrolled in a Level B assessment course.

  • No prior experience in administering standardized tests before the study.

Procedure

  • Each participant administered achievement measures to child/adolescent volunteers after receiving detailed feedback.

  • Checklists developed by researchers were utilized to monitor scoring accuracy, with significant detail on scoring components.

  • Final checklist variations:

    • WIAT-III: 151 possible examiner errors.

    • KTEA-2: 70 possible errors.

    • WJ-III: 59 possible errors.

  • Protocols were anonymized, and scoring was done by graduate students trained in assessment.

  • Inter-rater reliability was extremely high at .99.

Results

Overall Error Frequency

  • Mean total errors: 27.81 per protocol.

  • Error-free protocols: Only 2 out of 114.

Types and Frequencies of Errors

  • Four main error types identified:

    1. Incorrect start points

    2. Incorrect use of basal rule

    3. Incorrect adherence to discontinue rule

    4. Marking errors (e.g., incorrect scoring assignments)

  • Marking errors were found to be the most prevalent.

Comparisons Between Measures

  • Significant differences found in scores across measures.

  • Proportions of errors reviewed between composites (math, reading, writing, oral language).

  • Statistical Results:

    • Significant interaction between composite and measure concerning scoring errors.

    • Writing composite errors highest, with significantly more than other areas in all measures.

Within Measures: Subtests Analysis
KTEA-2
  • Significant differences across reading subtests and oral language.

  • Errors notably high in Written Expression compared to Spelling.

WJ-III
  • Differences noted in reading subtests, especially in Passage Comprehension.

WIAT-III
  • Scoring errors were high across various writing subtests, particularly in Written Expression.

Discussion

  • The vital role of academic achievement testing in allocating special education services necessitates high scoring fidelity.

  • Importance of precision in scoring highlighted by links to SLD diagnostic criteria.

  • The study provides foundational insights into the frequency and types of scoring errors in achievement assessments.

Implications for Practice

  • Findings underscore the need for rigorous training and regular feedback cycles in achievement assessment practices to remedy common errors.

  • Scoring templates are suggested for math calculations to minimize raw score marking errors.

Conclusion

  • This study contributes to understanding how scoring errors among educators in training can affect educational assessments.

  • It highlights the pressing need for structured training interventions to improve scoring accuracy, particularly in complex measures like WIAT-III and KTEA-2.

Limitations

  • Study relied on specific achievement measures in use at the time, which may limit generalizability.

  • Sample consisted of nonclinical populations due to the training environment, potentially overlooking real-world special education contexts.

Acknowledgments

  • Acknowledgment of assistance in data collection and scoring.

References

  1. Alfonzo, V. C., Johnson, A., Patinella, L., & Rader, D. E. (1998). Common WISC-III examiner errors. Psychology in the Schools.

  2. American Psychiatric Association. (2013). DSM-5.

  3. Belk, M. S., LoBello, S. G., Ray, G. E., & Zachar, P. (2002). WISC-III administration errors. Journal of Psychoeducational Assessment.

… (Additional references continue as per original material)