Common Examiner Scoring Errors on Academic Achievement Measures
Abstract
The study explores scoring errors across three well-known achievement tests:
Kaufman Test of Educational Achievement–Second Edition (KTEA-2)
Woodcock–Johnson Tests of Achievement–Third Edition (WJ-III)
Wechsler Individual Achievement Test–Third Edition (WIAT-III)
Sample size: 114 protocols evaluated.
Focus: Frequency and types of scoring errors made by novice examiners.
WIAT-III had the most scoring elements, making it the most vulnerable to errors.
More errors were found in composites requiring greater examiner inference and interpretation.
Findings discuss implications for assessment fidelity and training practices.
Keywords
Assessment fidelity
Level B assessment
Achievement tests
Scoring errors
Introduction
Assessment Fidelity: A critical element of effective academic intervention, crucial for response-to-intervention (RTI) approaches.
Previous research:
Limited focus on examiner errors in achievement assessments; most studies examined cognitive measures.
Highlighted the need for trained examiners and continual improvement in scoring practices for both novice and experienced examiners.
Shift in identification of Specific Learning Disorders (SLDs) towards specific academic skill deficits.
Types of Scoring Errors
Categories of Scoring Errors:
Administration errors
Scoring errors
Clerical errors
Common errors reported in cognitive assessments indicate a substantial likelihood of misclassification of functioning based on testing results.
Sources of Errors
Errors occur due to:
Multitasking during examination (recording responses, maintaining rapport, following instructions)
Inaccurate recording techniques, miscalculation of scores, and subjective judgment errors in scoring.
Higher error frequency observed in subtests requiring greater inference, such as comprehension and vocabulary in cognitive measures.
Study Objectives
To provide preliminary data on scoring errors evident on three achievement measures (WIAT-III, KTEA-2, WJ-III).
To investigate differences in error frequency between measures.
To analyze error-proneness within specific subtests (e.g., written expression vs. mathematics).
Methodology
Participants
114 certified teachers from a mid-sized Canadian university.
87 female, 27 male participants enrolled in a Level B assessment course.
No prior experience in administering standardized tests before the study.
Procedure
Each participant administered achievement measures to child/adolescent volunteers after receiving detailed feedback.
Checklists developed by researchers were utilized to monitor scoring accuracy, with significant detail on scoring components.
Final checklist variations:
WIAT-III: 151 possible examiner errors.
KTEA-2: 70 possible errors.
WJ-III: 59 possible errors.
Protocols were anonymized, and scoring was done by graduate students trained in assessment.
Inter-rater reliability was extremely high at .99.
Results
Overall Error Frequency
Mean total errors: 27.81 per protocol.
Error-free protocols: Only 2 out of 114.
Types and Frequencies of Errors
Four main error types identified:
Incorrect start points
Incorrect use of basal rule
Incorrect adherence to discontinue rule
Marking errors (e.g., incorrect scoring assignments)
Marking errors were found to be the most prevalent.
Comparisons Between Measures
Significant differences found in scores across measures.
Proportions of errors reviewed between composites (math, reading, writing, oral language).
Statistical Results:
Significant interaction between composite and measure concerning scoring errors.
Writing composite errors highest, with significantly more than other areas in all measures.
Within Measures: Subtests Analysis
KTEA-2
Significant differences across reading subtests and oral language.
Errors notably high in Written Expression compared to Spelling.
WJ-III
Differences noted in reading subtests, especially in Passage Comprehension.
WIAT-III
Scoring errors were high across various writing subtests, particularly in Written Expression.
Discussion
The vital role of academic achievement testing in allocating special education services necessitates high scoring fidelity.
Importance of precision in scoring highlighted by links to SLD diagnostic criteria.
The study provides foundational insights into the frequency and types of scoring errors in achievement assessments.
Implications for Practice
Findings underscore the need for rigorous training and regular feedback cycles in achievement assessment practices to remedy common errors.
Scoring templates are suggested for math calculations to minimize raw score marking errors.
Conclusion
This study contributes to understanding how scoring errors among educators in training can affect educational assessments.
It highlights the pressing need for structured training interventions to improve scoring accuracy, particularly in complex measures like WIAT-III and KTEA-2.
Limitations
Study relied on specific achievement measures in use at the time, which may limit generalizability.
Sample consisted of nonclinical populations due to the training environment, potentially overlooking real-world special education contexts.
Acknowledgments
Acknowledgment of assistance in data collection and scoring.
References
Alfonzo, V. C., Johnson, A., Patinella, L., & Rader, D. E. (1998). Common WISC-III examiner errors. Psychology in the Schools.
American Psychiatric Association. (2013). DSM-5.
Belk, M. S., LoBello, S. G., Ray, G. E., & Zachar, P. (2002). WISC-III administration errors. Journal of Psychoeducational Assessment.
… (Additional references continue as per original material)