Chapter 10 – Assessment for Education: Comprehensive Notes

The Role of Testing & Assessment in Education

• Educators use tests to answer: How well students learned? Can they apply knowledge? Are they ready for next level? What impedes learning? Are teachers effective?
• Multiple layers of testing: teacher-made, psychologist-recommended, district, state, federally mandated.
• Historical growth: No Child Left Behind (NCLB) ➜ Every Student Succeeds Act (ESSA)
• Mandated standards, annual assessments, accountability for all sub-groups (low-income, disabilities, race/ethnicity).
• “Adequate Yearly Progress” (AYP) – states decide penalties/interventions.
• Psychological & educational perspectives: screening, diagnosis, comparison, formative & summative assessment.
• Formative = real-time feedback; few critics.
• Summative = end-point evaluations; controversy when single high-stakes test used.
• Balance between accountability vs. innovation—excessive punishment can stifle teaching.

Case For & Against Standardized Testing

• Critics’ view:
• NCLB/ESSA forced “teaching to the test,” narrow skills, hollow learning.
• Resources diverted from instruction; no test perfect for everyone.
• Supporters’ view:
• Screening: early identification of at-risk students, potential prevention.
• Diagnostic data show weaknesses & strengths—helps allocate resources & challenge advanced students.
• Comparative data across classes/districts enables research on effective pedagogy.
• Philosophical tension likely to persist; policymakers must adjust & experiment.

Common Core State Standards (CCSS)

• Created to unify K-12 proficiency definitions across states; initial focus: English & Math; ~50 million students affected.
• Package = grade-level standards + computer-assisted tests + record-keeping for continuous improvement.
• Federal incentives: $\approx\$400$ million grants to two consortia: PARCC & Smarter Balanced.
• Requirements for new tests: measure growth above/below grade level, fine-grain data, evolve with tech & labor-market needs.
• Controversies:
• Misleading label “standards” vs. comprehensive program controlling curriculum & testing.
• Uniform vision of “college & career readiness” enforced by high-stakes tests; little room for teacher creativity or student exploration.
• Rapid, sight-unseen state adoption driven by >\$4 billion federal grants; raises constitutional/state-control questions.
• Origins murky—Bill & Melinda Gates Foundation $\approx\$2.5$ billion support ➜ debate: philanthropy vs. investment (costly hardware/software updates).
• Questions about authors’ credentials; two validation-panel experts (Milgram & Stotsky) refused endorsement.
• Pedagogical disputes (e.g., cold-reading without context).
• Age-inappropriate standards & test items (e.g., 1st-grade subtraction as unknown-addend; 4th-grade reading passage on marital infidelity).
• Privacy concerns over extensive longitudinal data.

Response to Intervention (RtI) & Multi-Tiered Systems of Support (MTSS)

• Historical backdrop: 1977 discrepancy definition of Specific Learning Disability (SLD) = large gap between IQ & achievement. Problems: “wait-to-fail,” poor predictor of remediation success.
• IDEA 2004 allows process “based on child’s response to scientific, research-based intervention.”
• RtI = data-driven, multilevel prevention framework:
• Tier 1: high-quality classroom instruction for all.
• Tier 2: small-group targeted interventions.
• Tier 3: individualized, intensive interventions.
• Continuous cycle: teach ➜ assess ➜ intervene ➜ reassess.
• MTSS extends RtI beyond academics to behavioral, social-emotional supports.
• Implementation challenges left to states/districts: criteria for tier movement, test selection, roles of teachers vs. specialists, problem-solving vs. standard-protocol models, hybrid approaches.
• Legislation forbids single-measure decisions; promotes integrative assessment (multiple tools & professionals).
• Ethical implication: avoid misidentifying cultural/economic disadvantage as disability; RtI “evens the playing field.”

Dynamic Assessment

• Test-intervention-retest model; rooted in Budoff, Feuerstein’s LPAD, Vygotsky’s “zone of proximal development” (distance between current performance & potential with guidance).
• Assessor actively teaches/prompts during testing—contrast with traditional neutrality.
• Compatible with RtI; offers actionable data for stakeholders.
• Validity varies with specific procedures; diversity of approaches complicates evidence base.

Achievement Tests

• Measure learned knowledge after defined instruction. Range: pop quizzes to statewide exams.
• Uses: monitor progress, placement, diagnostics, accountability.
• Categories:
• General Achievement Batteries (e.g., WIAT-III, STEP, SRA CAT).
• May include locator/routing tests; span K–12; norm- and criterion-referenced analyses.
• Must ensure current content, minimized bias, documented reliability/validity.
• Subject-Specific Tests: teacher-made or standardized (e.g., elementary reading, Cooperative Achievement Test, advanced placement, CLEP).
• Item types:
• Fact-based (rote): “Correlation $.7^2 = .49$ example.
• Conceptual (application): CLEP candidate scenario.

Aptitude & Readiness Tests

• Focus on informal learning potential; predict future performance (prognostic).
• Labels vary by level: preschool/elementary = “readiness”; higher = “aptitude.”
• Sample items: non-verbal analogies (o:O :: x:?); context determines whether same item is aptitude vs. achievement.
• Purposes: preschool readiness, grade promotion, college/grad school success, professional licensure.

Preschool Assessments

• Legislation: PL 94-142, PL 99-457, PL 105-17—mandated early identification & services.
• Tools:
• Medical: Apgar rating $0\text{–}10$ (Activity, Pulse, Grimace, Appearance, Respiration).
• Checklists/Rating scales (Connors, BASC-3).
• Developmental & intelligence tests (e.g., WPPSI-IV, SB-5) – predictive correlations $r\approx .3–.5$ increase at extremes.
• Other measures: temperament scales, language tests, family environment inventories, specialized tools (Child Sexual Behavior Inventory).
• Testing principles: colorful materials, dual-easel, sample items, brief sessions $\le$ 1 hour; assessors must monitor fatigue/motivation.

Elementary Readiness

• Metropolitan Readiness Tests (MRT-6): Levels I & II, oral administration, $\sim$ 90 min; subtests: Auditory Memory, Rhyming, Letter Recognition, etc.
• Normed on $\approx30,000$ children; good reliability/validity; practice tests provided.

Secondary & College-Entry

• SAT (Reading, Writing, Math) + Subject Tests; continual revisions; debates on bias (race, daylight-saving).
• ACT: curriculum-based; similar predictive validity; may tap creativity.
• Importance varies by institution; serves gatekeeping + diversity goals; balanced with GPA, essays, interviews.

Graduate & Professional

• GRE (Verbal, Quantitative, Analytical Writing) + Subject Tests; meta-analysis shows validity for GPA, faculty ratings; limited for creativity/grants.
• Prep steps: visit ETS, review intro texts, use commercial guides, target gaps.
• MAT: 100 analogies; cost-effective predictor. Example: Pavlov:Classical :: Skinner:Operant.
• Professional exams: MCAT, LSAT, DAT, PCAT, OAT, etc.—reflect evolving definitions of aptitude and societal needs.

Diagnostic Tests

• Used when poor performance observed; pinpoint specific deficits; may contain simpler items than broad achievement tests.
• Reading: Woodcock Reading Mastery Tests-III (WRMT-III) – subtests Letter ID, Word Attack, Phonological Awareness, etc.
• Math: KeyMath-3 Diagnostic System (individually administered) & GMADE (group).

Psychoeducational Test Batteries

• Combine cognitive & achievement measures; enable normative comparisons + skills diagnosis for intervention.
• Basic/Applied/Fluency skills across reading/writing/math (Table of skill types).
• Kaufman Assessment Battery for Children–II Normative Update (KABC-II NU):
• Roots in Luria’s neuropsychology; measures Planning, Attention, Simultaneous, Successive (PASS) + CHC broad abilities.
• Co-normed with KTEA-3; flexible CHC vs. Luria interpretive models; reduced group differences by minimizing verbal load.
• Reviewer concerns: dual-model clarity, subtest specificity; overall sound psychometrics validated in later studies.
• Woodcock-Johnson IV (WJ IV): three batteries (Cognitive, Achievement, Oral Language), ages 2–90+; CHC-based composites (Gf, Gc, GIA).
• Practical application: school psychologists use CHC profiles to link deficits (e.g., low auditory processing) to interventions.

Other Assessment Techniques

• Performance Tasks & Assessment: real-world work samples graded by domain experts (e.g., architecture blueprint).
• Portfolio Assessment: student-selected work illustrating learning (e.g., algebra mileage receipts); benefits = engagement; drawbacks = scoring reliability, large-class impracticality.
• Authentic Assessment: meaningful tasks demonstrating transfer to real life; e.g., chefs filleting fish, psychopathology diagnosis from videos.
• Risks: prior experience confounds, extraneous skills overlap.
• Peer Appraisal Methods:
• “Guess Who?” technique—classmates identify peers fitting descriptors.
• Nominating technique—select peers for activities; yields sociogram maps of group dynamics.
• Instruments for Study Habits, Interests, Attitudes:
• Study Habits Checklist (note taking, time use).
• SSHA – Delay Avoidance, Work Methods, etc.; yields study skills & attitude scores.
• Interest inventories (What I Like to Do) guide instruction.
• Attitude surveys (Quality of School Life Scales) monitor engagement & retention.

Ethical, Philosophical & Practical Implications

• Balance accountability vs. autonomy: excessive high-stakes pressure may erode joy & creativity; no accountability may encourage complacency.
• Test bias & equity: reduce cultural/linguistic bias; use multiple measures; interpret in context of RtI & integrative assessment.
• Privacy & data security: CCSS longitudinal records, portfolio contents.
• Resource allocation: testing consumes time/money; policymakers must weigh trade-offs.

Connections to Foundational Principles

• Vygotsky’s social-constructivist view shapes dynamic assessment & RtI emphasis on scaffolded learning.
• Classical test theory undergirds reliability/validity criteria for all instruments; CHC & PASS frameworks advance cognitive measurement.
• Ethical standards (IDEA: no single measure) reinforce multimethod, multidisciplinary approach.

Just-Think Examples & Metaphors

• Detective metaphor (Sherlock Holmes/Dr. House) for hypothesis testing in RtI.
• “Go-figure curriculum” quip for CCSS—teachers told what to teach/test, must figure out how.
• Hypothetical aptitude item design & student mileage-receipt portfolio illustrate concept-vs-fact items & authentic assessment.

Key Numerical & Statistical References

• Correlation-variance rule: $(r^2 = \text{variance accounted})$ e.g., $(.7^2 = 0.49)$ .
• Infant IQ-later IQ predictive coefficients $r\approx.3–.5$ (higher at score extremes).
• Apgar scale range $0–10$ ; five variables scored $0,1,2$ each.
• Federal incentives: CCSS start-up grants $\approx\$400$ million; state adoption grants $\approx\$4$ billion; Gates Foundation $\approx\$2.5$ billion.
• KABC-II NU & WRMT-III normative samples >3,000 and >3,300 respectively.

Study Tips (Meta-level)

• Remember distinctions: formative vs. summative; achievement vs. aptitude; diagnostic vs. evaluative; static vs. dynamic; portfolio vs. authentic.
• Link theoretical models (CHC, PASS, Luria, Vygotsky) to specific tests.
• Practice applying formulas, e.g., variance of correlations.
• When evaluating any test: consider reliability, validity, bias, normative sample, relevance, practicality.