Reliability Summary

Reliability

Reliability concerns consistency, repeatability, reproducibility, and stability.

Measurement

Measurement assigns numbers to objects or events based on rules. It allows for statistical analysis and clear communication.

Levels of Measurement

Nominal: Identification or labels.
Ordinal: Ordering of subjects.
Interval: Ordered with estimated distance between scores.
Ratio: Ordered, defines distance, and has a zero point.

Sources of Measurement Error

Inaccurate or inconsistent raters/testers, phenomenon variation, and confounding situations lead to measurement error.

Classic Measurement Equation

$Observed score = True score + Random error$

Consistency

Intrarater reliability: Consistency of scoring by one person at different times.
Interrater reliability: Consistency of scoring by multiple people.
Intercoder reliability: Consistency in coding qualitative data by multiple coders, calculated using Cohen’s kappa.

Test-Retest Reliability

Stability when administering the same test, affected by phenomenon stability, reactivity, and practice effect.

Calculating Test-Retest Reliability

Pearson product moment correlation: Association between results.
Intraclass correlations (ICCs): Control for systematic bias.

Homogeneity

Extent to which test items behave similarly; measured by Cronbach’s alpha (acceptable > $0.70$ for new measures, > $0.80$ for established, > $0.90$ for clinical evaluation).

Reliability of Physical Measures

Considers systematic (consistent) and random (inconsistent) errors. Technical error of measurement (TEM) is used for error detection.

Improving Reliability

Thorough rater training, supervision, reliability checks, instrument retesting, adding appropriate items, deleting items that lower alpha coefficient, standardizing testing conditions and instructions.

Reliability

Reliability means things are consistent and stable.

Measurement

Measurement means using numbers to describe things. This helps us analyze and communicate clearly.

Levels of Measurement

Nominal: Just labels or names.
Ordinal: Putting things in order.
Interval: Putting things in order with equal spacing.
Ratio: Putting things in order with equal spacing and a true zero point.

Sources of Measurement Error

Errors happen when testers are inconsistent, things change, or other factors mess things up.

Classic Measurement Equation

$Observed score = True score + Random error$

Consistency

Intrarater reliability: One person scores the same thing consistently at different times.
Interrater reliability: Multiple people score the same thing consistently.
Intercoder reliability: Multiple people code data consistently.

Test-Retest Reliability

Getting the same results when using the same test again. This can be affected if things change, people react to the test, or they get better with practice.

Calculating Test-Retest Reliability

Pearson product moment correlation: Checking if results are related.
Intraclass correlations (ICCs): Controlling for bias.

Homogeneity

Making sure all test questions are similar. Measured with Cronbach’s alpha (needs to be above $0.70$ ).

Reliability of Physical Measures

Looking at consistent and inconsistent mistakes. We use TEM to find errors.

Improving Reliability

Train testers well, keep checking, retest tools, add good questions, remove bad questions, and keep testing conditions the same.