Ensuring measurements accurately reflect the intended constructs is crucial in clinical practice and research.
While some measurements (e.g., length using a tape measure) seem straightforward, many health concepts require careful validation.
Types of Measurement Validity
Face Validity:
Refers to whether a measurement appears to measure the intended concept.
Established by pre-testing the instrument with experts or colleagues.
They judge if the measure reflects the concept.
Content Validity:
Extent to which a measure covers all relevant dimensions of the concept.
Example: An exam should cover all material taught in a course.
If an exam doesn't represent what was covered in the course, it lacks content validity.
Criterion Validity:
A measure correlates with other established measures (gold standards) of the same concept.
Example: Comparing a new subjective health measure against the SF-36.
Expectation: Scores on both measures should correlate.
Construct Validity:
How well a measure aligns with theoretical expectations.
Assesses if it measures a theoretical construct.
Example: IQ tests - questioned for measuring only one dimension of intelligence (potential in a certain academic system).
Establishing construct validity requires explicitly defining theoretical concepts and relationships.
Measurement Reliability
Reliability indicates the consistency, stability, and dependability of a measurement.
All measurement instruments possess some degree of measurement error.
Higher measurement error leads to lower reliability.
Reliability is estimated from a sample representative of a specific population, so true reliability may vary in different populations.
Consistency and Agreement
Consistency:
Reported via correlation statistics (correlation coefficients) that represent the strength of the association between two measurements.
Common coefficients: Pearson’s r and Intraclass Correlation Coefficient (ICC).
Values range from 0 to 1 (0 = no correlation, 1 = perfect correlation).
Agreement:
Ideally reported alongside correlation.
Indicates the magnitude of difference between measurements (e.g., measurement A differs from measurement B by a certain amount).
Useful for determining if the error size is acceptable.
Factors Affecting Consistency and Agreement
Multiple error sources can affect consistency and agreement.
Example: Measuring core temperature with an ear thermometer involves factors like correct insertion, room temperature stability, and participant's core temperature stability.
Types of Reliability
Test-Retest Reliability:
Evaluates the stability of a measure across two occasions, assuming no change in the measured construct.
Example: IQ tests should yield similar results across administrations.
IQ1 \approx IQ2
Intra-rater Reliability:
Evaluates a single rater's ability to obtain the same result repeatedly with the same observation.
Any variation in measurements is attributed to the tester.
Inter-rater Reliability:
Evaluates the ability of different raters to obtain the same measurement.
Any variation is attributed to the different raters.
Random and Systematic Error
Random Error: unpredictable & scattered around the true value.
Systematic Error: Predictable and directional.
Example: Improved performance with repeated hamstring stretch measurements.
Later tests (3 & 4) may have higher values than earlier tests (1 & 2).
Random error comes from unpredictable factors such as tester fatigue, inattention, or simple mistakes.
Example: Inconsistent height measurements due to variable tape measure stretching.
Height{measured} = Height{actual} + \epsilon_{random}
Correlation Coefficients, Confidence Interval, and P-Value
Some studies report absolute differences; others report correlation coefficients (r or ICC).
Coefficients range from 0 to 1.
0 = no correlation
1 = perfect correlation
Negative values (0 to -1) indicate a negative correlation.