KB

Measurement Validity and Reliability

Measurement Validity

  • Ensuring measurements accurately reflect the intended constructs is crucial in clinical practice and research.
  • While some measurements (e.g., length using a tape measure) seem straightforward, many health concepts require careful validation.

Types of Measurement Validity

  • Face Validity:
    • Refers to whether a measurement appears to measure the intended concept.
    • Established by pre-testing the instrument with experts or colleagues.
    • They judge if the measure reflects the concept.
  • Content Validity:
    • Extent to which a measure covers all relevant dimensions of the concept.
    • Example: An exam should cover all material taught in a course.
    • If an exam doesn't represent what was covered in the course, it lacks content validity.
  • Criterion Validity:
    • A measure correlates with other established measures (gold standards) of the same concept.
    • Example: Comparing a new subjective health measure against the SF-36.
    • Expectation: Scores on both measures should correlate.
  • Construct Validity:
    • How well a measure aligns with theoretical expectations.
    • Assesses if it measures a theoretical construct.
    • Example: IQ tests - questioned for measuring only one dimension of intelligence (potential in a certain academic system).
    • Establishing construct validity requires explicitly defining theoretical concepts and relationships.

Measurement Reliability

  • Reliability indicates the consistency, stability, and dependability of a measurement.
  • All measurement instruments possess some degree of measurement error.
  • Higher measurement error leads to lower reliability.
  • Reliability is estimated from a sample representative of a specific population, so true reliability may vary in different populations.

Consistency and Agreement

  • Consistency:
    • Reported via correlation statistics (correlation coefficients) that represent the strength of the association between two measurements.
    • Common coefficients: Pearson’s r and Intraclass Correlation Coefficient (ICC).
    • Values range from 0 to 1 (0 = no correlation, 1 = perfect correlation).
  • Agreement:
    • Ideally reported alongside correlation.
    • Indicates the magnitude of difference between measurements (e.g., measurement A differs from measurement B by a certain amount).
    • Useful for determining if the error size is acceptable.

Factors Affecting Consistency and Agreement

  • Multiple error sources can affect consistency and agreement.
  • Example: Measuring core temperature with an ear thermometer involves factors like correct insertion, room temperature stability, and participant's core temperature stability.

Types of Reliability

  • Test-Retest Reliability:
    • Evaluates the stability of a measure across two occasions, assuming no change in the measured construct.
    • Example: IQ tests should yield similar results across administrations.
    • IQ1 \approx IQ2
  • Intra-rater Reliability:
    • Evaluates a single rater's ability to obtain the same result repeatedly with the same observation.
    • Any variation in measurements is attributed to the tester.
  • Inter-rater Reliability:
    • Evaluates the ability of different raters to obtain the same measurement.
    • Any variation is attributed to the different raters.

Random and Systematic Error

  • Random Error: unpredictable & scattered around the true value.
  • Systematic Error: Predictable and directional.
    • Example: Improved performance with repeated hamstring stretch measurements.
    • Later tests (3 & 4) may have higher values than earlier tests (1 & 2).
  • Random error comes from unpredictable factors such as tester fatigue, inattention, or simple mistakes.
    • Example: Inconsistent height measurements due to variable tape measure stretching.
      Height{measured} = Height{actual} + \epsilon_{random}

Correlation Coefficients, Confidence Interval, and P-Value

  • Some studies report absolute differences; others report correlation coefficients (r or ICC).
  • Coefficients range from 0 to 1.
    • 0 = no correlation
    • 1 = perfect correlation
  • Negative values (0 to -1) indicate a negative correlation.