TM

SI - Understanding Reliability and Validity in Measurements

I. Reliability

Reliability refers to the degree to which the same event produces consistent results. It is defined by:

  • Consistency

  • Repeatability

  • Dependability

  • Reproducibility

Types of Measurement Error
  1. Systematic Errors (Predictable, consistent)

    • Examples: Improper use of landmarks, improperly calibrated tools

  2. Random Errors (Unpredictable, inconsistent)

    • Examples: Examiner fatigue, environmental disruptions

Sources of Error
  • Examiner (e.g., improper technique)

  • Examined (e.g., fatigue, cooperation)

  • Examination (e.g., poorly calibrated equipment)

Types of Reliability
  1. Test-retest Reliability: Consistency of a test over time

  2. Internal Consistency: Correlation among items within a test (Cronbach's alpha)

  3. Intra-rater Reliability: Stability of one rater's measurements over time

  4. Inter-rater Reliability: Agreement between two or more raters

Quantifying Reliability
  • Intraclass Correlation Coefficient (ICC): Ideal for continuous data; reflects both relationship and agreement

    • >0.90 = Excellent reliability

    • >0.75 = Good reliability

    • <0.75 = Poor to moderate reliability

  • Cronbach's alpha: Measures internal consistency

  • Kappa statistic: Used for categorical data

  • Bland-Altman plots: Visualizes agreement between methods

II. Validity

Validity refers to whether a measurement accurately reflects what it intends to measure.

Types of Validity
  1. Face Validity: Simple determination of whether the test appears effective

  2. Content Validity: Extent to which a test covers all aspects of the concept

  3. Construct Validity: Measures an abstract concept like intelligence or personality

  4. Criterion-related Validity: Compares a test to an established gold standard

    • Concurrent Validity: Correlates with current performance

    • Predictive Validity: Predicts future outcomes

  5. Responsiveness: Ability to detect meaningful changes over time

Quantifying Validity
  • Correlation Coefficient (r)

    • r = 0.35+ is generally acceptable

    • Often converted to for better interpretation (percentage of variability explained)

  • Effect Size (ES): Measures change between groups or over time

    • Large Effect = 0.8

    • Moderate Effect = 0.5

    • Small Effect = 0.2

III. Key Differences Between Reliability and Validity

  • Reliability is necessary for validity, but reliability alone does not guarantee validity.

  • An instrument can be reliable (consistent) but still invalid (not accurate).

IV. Minimizing Errors in Measurement

  • Use operational definitions for clarity and replication

  • Conduct proper training for examiners

  • Regularly inspect and calibrate equipment

  • Utilize blinded assessments to reduce bias

  • Separate observation from inference during evaluations

V. Example Application: Knee Outcome Survey

  • Demonstrated high ICC (0.97) indicating excellent reliability

  • Showed strong associations between baseline and follow-up scores, confirming both concurrent and predictive validity

VI. Key Takeaways

  • Reliability ensures consistency; Validity ensures accuracy.

  • Employ structured methodologies to enhance both.

  • ICC is highly recommended for reliability in continuous data.

  • Responsiveness is critical for detecting real changes in clinical outcomes.