Common Examiner Scoring Errors on Academic Achievement Measures

The study explores scoring errors across three well-known achievement tests:
- Kaufman Test of Educational Achievement–Second Edition (KTEA-2)
- Woodcock–Johnson Tests of Achievement–Third Edition (WJ-III)
- Wechsler Individual Achievement Test–Third Edition (WIAT-III)
Sample size: 114 protocols evaluated.
Focus: Frequency and types of scoring errors made by novice examiners.
WIAT-III had the most scoring elements, making it the most vulnerable to errors.
More errors were found in composites requiring greater examiner inference and interpretation.
Findings discuss implications for assessment fidelity and training practices.

Assessment Fidelity: A critical element of effective academic intervention, crucial for response-to-intervention (RTI) approaches.
Previous research:
- Limited focus on examiner errors in achievement assessments; most studies examined cognitive measures.
- Highlighted the need for trained examiners and continual improvement in scoring practices for both novice and experienced examiners.
Shift in identification of Specific Learning Disorders (SLDs) towards specific academic skill deficits.

Categories of Scoring Errors:
- Administration errors
- Scoring errors
- Clerical errors
Common errors reported in cognitive assessments indicate a substantial likelihood of misclassification of functioning based on testing results.

Errors occur due to:
- Multitasking during examination (recording responses, maintaining rapport, following instructions)
- Inaccurate recording techniques, miscalculation of scores, and subjective judgment errors in scoring.
Higher error frequency observed in subtests requiring greater inference, such as comprehension and vocabulary in cognitive measures.

To provide preliminary data on scoring errors evident on three achievement measures (WIAT-III, KTEA-2, WJ-III).
To investigate differences in error frequency between measures.
To analyze error-proneness within specific subtests (e.g., written expression vs. mathematics).

Each participant administered achievement measures to child/adolescent volunteers after receiving detailed feedback.
Checklists developed by researchers were utilized to monitor scoring accuracy, with significant detail on scoring components.
Final checklist variations:
- WIAT-III: 151 possible examiner errors.
- KTEA-2: 70 possible errors.
- WJ-III: 59 possible errors.
Protocols were anonymized, and scoring was done by graduate students trained in assessment.
Inter-rater reliability was extremely high at .99.

Four main error types identified:
1. Incorrect start points
2. Incorrect use of basal rule
3. Incorrect adherence to discontinue rule
4. Marking errors (e.g., incorrect scoring assignments)
Marking errors were found to be the most prevalent.

Significant differences found in scores across measures.
Proportions of errors reviewed between composites (math, reading, writing, oral language).
Statistical Results:
- Significant interaction between composite and measure concerning scoring errors.
- Writing composite errors highest, with significantly more than other areas in all measures.

Scoring errors were high across various writing subtests, particularly in Written Expression.

The vital role of academic achievement testing in allocating special education services necessitates high scoring fidelity.
Importance of precision in scoring highlighted by links to SLD diagnostic criteria.
The study provides foundational insights into the frequency and types of scoring errors in achievement assessments.

Findings underscore the need for rigorous training and regular feedback cycles in achievement assessment practices to remedy common errors.
Scoring templates are suggested for math calculations to minimize raw score marking errors.

This study contributes to understanding how scoring errors among educators in training can affect educational assessments.
It highlights the pressing need for structured training interventions to improve scoring accuracy, particularly in complex measures like WIAT-III and KTEA-2.

Study relied on specific achievement measures in use at the time, which may limit generalizability.
Sample consisted of nonclinical populations due to the training environment, potentially overlooking real-world special education contexts.

Alfonzo, V. C., Johnson, A., Patinella, L., & Rader, D. E. (1998). Common WISC-III examiner errors. Psychology in the Schools.
American Psychiatric Association. (2013). DSM-5.
Belk, M. S., LoBello, S. G., Ray, G. E., & Zachar, P. (2002). WISC-III administration errors. Journal of Psychoeducational Assessment.

… (Additional references continue as per original material)