PED-106-Report-G-1

VALIDITY AND RELIABILITY OF ASSESSMENT

Assessment of Learning 1 | PED 106


VALIDITY IN ASSESSMENT

  • Definition: Derived from the Latin word validus, meaning strong or powerful.

    • Quality of being based on truth/reason.

  • Validity refers to the ability of an assessment to measure what it is intended to measure.

    • Associated with the accuracy of inferences made by teachers based on student assessments (McMillan, 2007).


CONTENT-RELATED EVIDENCE

  • Definition: Content-related validity aligns the content of an assessment with the subject matter it measures.

  • Factors to Consider:

    • Relevance of questions

    • Balance of Coverage

    • Excluded Irrelevant Content


TYPES OF VALIDITY UNDER CONTENT-RELATED EVIDENCE

  • Face Validity: Superficial judgment of a test's measurement claim.

  • Instructional Validity: Alignment of content/skills assessed with taught material.


TEST OF SPECIFICATION (TOS)

  • Definition: Defines content area and describes learning outcomes at cognitive levels (Notar et al., 2004).

  • Table of Specification: Blueprint for assessment structure.


TIME REQUIREMENTS IN ASSESSMENT

  • Importance of considering time in assessment design for enhancing validity.


CRITERION-RELATED EVIDENCE

  • Definition: Validity based on how assessment scores correlate with an external criterion.

  • Types:

    • Predictive Validity: Assesses future performance prediction.

    • Concurrent Validity: Estimates current performance relating to established measures.


CONSTRUCT-RELATED EVIDENCE

  • Definition: Measures intended unobservable traits (McMillan, 2007).


TYPES OF CONSTRUCT-RELATED EVIDENCE

  1. Theoretical Evidence: Established theories supporting assessment aims.

  2. Logical Evidence: Clear reasoning and practical examples validating tests.

    • Methods:

      • Differential Group Study: Compares different group test scores.

      • Intervention Study: Tests teaching effectiveness on scores.

  3. Statistical Evidence: Utilizes data statistics for effectiveness.

    • Convergent Validity: Similar results in related traits.

    • Divergent Validity: Different results for unrelated traits.


METHODS FOR ESTABLISHING CONSTRUCT VALIDITY

  • Convergent Validity: Similar test results for similar measures.

  • Divergent Validity: Different test results for differing measures.


MESSICK’S UNIFIED CONCEPT OF VALIDITY (1989)

  • Integrates multiple validity types into a single framework.

  • Six Aspects of Validity:

    1. Content: Coverage of essential content areas.

    2. Substantive: Representation of theoretical constructs.

    3. Structural: Alignment of scoring with constructs.

    4. Generalizability: Application across populations and contexts.

    5. External: Generalization of study outcomes.

    6. Consequential: Assessing actual vs. intended effects.


THREATS TO VALIDITY

  • Factors affecting validity identified by Miller, Linn & Gronlund (2009):

    1. Unclear test directions

    2. Complicated vocabulary

    3. Ambiguous statements

    4. Inadequate time limits

    5. Inappropriate item difficulty

    6. Poorly constructed items

    7. Inappropriate test items

    8. Short test durations

    9. Improper item arrangement

    10. Identifiable answer patterns


SUGGESTIONS TO ENHANCE VALIDITY (McMillan, 2007)

  • Engage third-party evaluators for clarity

  • Validate different assessment methods against shared outcomes

  • Detailed table of specifications

  • Comparison of predicted vs actual consequences

  • Offer adequate test completion time

  • Improve instructions and item clarity


RELIABILITY

  • Definition: Consistency and reproducibility of assessment results.

  • Consistency and reproducibility of assessment results.

  • Relationship with Validity: Necessary for validity, but does not guarantee accuracy.


TYPES OF RELIABILITY

  1. Internal Reliability: Consistency across items in a test.

  2. External Reliability: Variance from one use to another.


SOURCES OF RELIABILITY EVIDENCE

  1. Evidence based on stability

  2. Evidence based on equivalent forms

  3. Evidence based on internal consistency

  4. Evidence based on scorer consistency

  5. Evidence based on decision consistency


STABILITY

  • Refers to test-retest reliability; correlates scores from same test over time.


EQUIVALENCE

  • Consistency among different versions of a test.


INTERNAL CONSISTENCY

  • Measures how test items correlate to each other.

    • Split-Half Method: Compares separated test items.

    • Cronbach's Alpha: Assesses item correlation.


SCORER/RATER CONSISTENCY

  • Variability in ratings leads to score disagreements and is influenced by numerous factors:

    • Poor training, bias, etc.

  • Tools like Spearman's rho enhance rating consistency.


DECISION CONSISTENCY

  • Focuses on classifications rather than mere score comparisons.

  • Essential for ensuring consistency in educational evaluations.


MEASUREMENT ERRORS

  • The difference between observed scores and actual abilities.


FACTORS CONTRIBUTING TO MEASUREMENT ERRORS

  1. Examinee-Specific Factors: Related to the student, such as fatigue, etc.

  2. Test-Specific Factors: Includes unclear instructions, difficult questions, etc.


TYPES OF ERRORS

  1. Random Errors: Unpredictable fluctuations.

  2. Systematic Errors: Consistent, directional biases.


CLASSICAL TEST THEORY FORMULA

  • Formula: X = T + E

    • X: Observed Score

    • T: True Score

    • E: Error Component,

    • Explains that observed scores consist of true ability plus error.


STANDARD ERROR OF MEASUREMENT (SEM)

  • SEM estimates scores variation across multiple tests.

    • Formula: SEM = Sx √(1 - rxx)

    • Informs the reliability of scores (lower SEM implies higher reliability).


CONFIDENCE INTERVALS USING SEM

  • 68% Confidence Interval: ±1 SEM

  • 95% Confidence Interval: ±2 SEMs

  • 99% Confidence Interval: ±3 SEMs

  • Enhances understanding of score reliability.


IMPROVING RELIABILITY

  • Increase the number of test items or testing duration to reduce random errors.

  • Minimize systematic errors through careful design and clear scoring criteria.


OBJECTIVE TESTS vs. PERFORMANCE ASSESSMENTS

  • OBJECTIVE TESTS vs. PERFORMANCE ASSESSMENTS.


IMPROVING RELIABILITY IN ORAL AND PERFORMANCE ASSESSMENTS

  • Apply strategies of structured tests to enhance oral assessment reliability.


SELF-ASSESSMENTS

  • Reliable when students learn effective self-evaluation.

  • Consistency may vary for younger students due to inexperience.


STRATEGIES FOR ENHANCING RELIABILITY (Nitko & Brookhart, 2011)

  • Lengthening assessments

  • Broadening scopes

  • Employing formal procedures

  • Involving multiple raters

  • Combining results from different assessments

  • Differentiating assessment methods based on requirements.


CONCLUSION

  • Validity and reliability ensure accurate, fair, and consistent assessment results.

  • Validity aligns assessments with intended measures, while reliability secures consistency over time, fostering trust in data and informed decision-making in education.


THANK YOU

Presented By: Dulaca, Nica Jaye S.; Dulaca, Rica Lyn S.; Nuyda, Aisha Zakiya V.; Taghoy, Pauline Kate D.