PED-106-Report-G-1
VALIDITY AND RELIABILITY OF ASSESSMENT
Assessment of Learning 1 | PED 106
VALIDITY IN ASSESSMENT
Definition: Derived from the Latin word validus, meaning strong or powerful.
Quality of being based on truth/reason.
Validity refers to the ability of an assessment to measure what it is intended to measure.
Associated with the accuracy of inferences made by teachers based on student assessments (McMillan, 2007).
CONTENT-RELATED EVIDENCE
Definition: Content-related validity aligns the content of an assessment with the subject matter it measures.
Factors to Consider:
Relevance of questions
Balance of Coverage
Excluded Irrelevant Content
TYPES OF VALIDITY UNDER CONTENT-RELATED EVIDENCE
Face Validity: Superficial judgment of a test's measurement claim.
Instructional Validity: Alignment of content/skills assessed with taught material.
TEST OF SPECIFICATION (TOS)
Definition: Defines content area and describes learning outcomes at cognitive levels (Notar et al., 2004).
Table of Specification: Blueprint for assessment structure.
TIME REQUIREMENTS IN ASSESSMENT
Importance of considering time in assessment design for enhancing validity.
CRITERION-RELATED EVIDENCE
Definition: Validity based on how assessment scores correlate with an external criterion.
Types:
Predictive Validity: Assesses future performance prediction.
Concurrent Validity: Estimates current performance relating to established measures.
CONSTRUCT-RELATED EVIDENCE
Definition: Measures intended unobservable traits (McMillan, 2007).
TYPES OF CONSTRUCT-RELATED EVIDENCE
Theoretical Evidence: Established theories supporting assessment aims.
Logical Evidence: Clear reasoning and practical examples validating tests.
Methods:
Differential Group Study: Compares different group test scores.
Intervention Study: Tests teaching effectiveness on scores.
Statistical Evidence: Utilizes data statistics for effectiveness.
Convergent Validity: Similar results in related traits.
Divergent Validity: Different results for unrelated traits.
METHODS FOR ESTABLISHING CONSTRUCT VALIDITY
Convergent Validity: Similar test results for similar measures.
Divergent Validity: Different test results for differing measures.
MESSICK’S UNIFIED CONCEPT OF VALIDITY (1989)
Integrates multiple validity types into a single framework.
Six Aspects of Validity:
Content: Coverage of essential content areas.
Substantive: Representation of theoretical constructs.
Structural: Alignment of scoring with constructs.
Generalizability: Application across populations and contexts.
External: Generalization of study outcomes.
Consequential: Assessing actual vs. intended effects.
THREATS TO VALIDITY
Factors affecting validity identified by Miller, Linn & Gronlund (2009):
Unclear test directions
Complicated vocabulary
Ambiguous statements
Inadequate time limits
Inappropriate item difficulty
Poorly constructed items
Inappropriate test items
Short test durations
Improper item arrangement
Identifiable answer patterns
SUGGESTIONS TO ENHANCE VALIDITY (McMillan, 2007)
Engage third-party evaluators for clarity
Validate different assessment methods against shared outcomes
Detailed table of specifications
Comparison of predicted vs actual consequences
Offer adequate test completion time
Improve instructions and item clarity
RELIABILITY
Definition: Consistency and reproducibility of assessment results.
Consistency and reproducibility of assessment results.
Relationship with Validity: Necessary for validity, but does not guarantee accuracy.
TYPES OF RELIABILITY
Internal Reliability: Consistency across items in a test.
External Reliability: Variance from one use to another.
SOURCES OF RELIABILITY EVIDENCE
Evidence based on stability
Evidence based on equivalent forms
Evidence based on internal consistency
Evidence based on scorer consistency
Evidence based on decision consistency
STABILITY
Refers to test-retest reliability; correlates scores from same test over time.
EQUIVALENCE
Consistency among different versions of a test.
INTERNAL CONSISTENCY
Measures how test items correlate to each other.
Split-Half Method: Compares separated test items.
Cronbach's Alpha: Assesses item correlation.
SCORER/RATER CONSISTENCY
Variability in ratings leads to score disagreements and is influenced by numerous factors:
Poor training, bias, etc.
Tools like Spearman's rho enhance rating consistency.
DECISION CONSISTENCY
Focuses on classifications rather than mere score comparisons.
Essential for ensuring consistency in educational evaluations.
MEASUREMENT ERRORS
The difference between observed scores and actual abilities.
FACTORS CONTRIBUTING TO MEASUREMENT ERRORS
Examinee-Specific Factors: Related to the student, such as fatigue, etc.
Test-Specific Factors: Includes unclear instructions, difficult questions, etc.
TYPES OF ERRORS
Random Errors: Unpredictable fluctuations.
Systematic Errors: Consistent, directional biases.
CLASSICAL TEST THEORY FORMULA
Formula: X = T + E
X: Observed Score
T: True Score
E: Error Component,
Explains that observed scores consist of true ability plus error.
STANDARD ERROR OF MEASUREMENT (SEM)
SEM estimates scores variation across multiple tests.
Formula: SEM = Sx √(1 - rxx)
Informs the reliability of scores (lower SEM implies higher reliability).
CONFIDENCE INTERVALS USING SEM
68% Confidence Interval: ±1 SEM
95% Confidence Interval: ±2 SEMs
99% Confidence Interval: ±3 SEMs
Enhances understanding of score reliability.
IMPROVING RELIABILITY
Increase the number of test items or testing duration to reduce random errors.
Minimize systematic errors through careful design and clear scoring criteria.
OBJECTIVE TESTS vs. PERFORMANCE ASSESSMENTS
OBJECTIVE TESTS vs. PERFORMANCE ASSESSMENTS.
IMPROVING RELIABILITY IN ORAL AND PERFORMANCE ASSESSMENTS
Apply strategies of structured tests to enhance oral assessment reliability.
SELF-ASSESSMENTS
Reliable when students learn effective self-evaluation.
Consistency may vary for younger students due to inexperience.
STRATEGIES FOR ENHANCING RELIABILITY (Nitko & Brookhart, 2011)
Lengthening assessments
Broadening scopes
Employing formal procedures
Involving multiple raters
Combining results from different assessments
Differentiating assessment methods based on requirements.
CONCLUSION
Validity and reliability ensure accurate, fair, and consistent assessment results.
Validity aligns assessments with intended measures, while reliability secures consistency over time, fostering trust in data and informed decision-making in education.
THANK YOU
Presented By: Dulaca, Nica Jaye S.; Dulaca, Rica Lyn S.; Nuyda, Aisha Zakiya V.; Taghoy, Pauline Kate D.