Using Statistics and Measurement Error To Inform Training

Using Reliability Measures for Decision Making

Commonly used to estimate a worthwhile change in performance.
Based on effect size statistics.
Cohen's thresholds (1988) for effect sizes:
- Small: 0.2
- Moderate: 0.5
- Large: 0.8
Hopkins (2000) adapted these:
- Small worthwhile change: 0.2 x between-subject standard deviation
- Moderate worthwhile change: 0.5 x between-subject standard deviation
- Large worthwhile change: 0.8 x between-subject standard deviation
Calculation: Multiply the effect size threshold by the between-subject standard deviation.

Strength and conditioning coaches often look for small but worthwhile changes.
Moderate effect sizes may be used for younger athletes with larger potential gains.

Typical error: \frac{\text{standard deviation of test}}{\sqrt{2}}
If SWC > typical error: Changes exceeding SWC are considered real changes.
If typical error > SWC: Changes reaching SWC may be due to normal variation (biological, technological, or protocol-related).

Arbitrary changes may be deemed meaningful.
Represents a mean worthwhile change for a population, not individual responses.
Individual athletes respond differently to training interventions.
Dependent on sample distribution (requires normal distribution).
Precision depends on sample size.
Effect sizes were originally designed for large sample sizes (psychology) while strength conditioning often deals with smaller samples.
Small sample sizes can inflate the perceived SWC.

Paper examines whether SWC reflects real changes in female soccer players.
Findings: SWC (0.2 x between-participants standard deviation) often missed changes practitioners considered real.

Assess SWC from a reliability standpoint (is measurement error > SWC?).
Consider assumptions of effect sizes (sample size, distribution).
Contextualize the SWC result based on measurement error and individual athletes.
SWC is just a number; consider the practical significance for the specific situation.

Uses the standard error of measurement to calculate limits of meaningful difference.
Conceptually similar to the limits of agreement approach.
Formula: 1.96 \times \sqrt{2} \times \text{standard error of measurement}
- Note: the transcript says 1.6 which is not correct instead of 1.96.
Example: 3RM deadlift study with three sessions separated by 48 hours.
Dashed lines on the figure represent the calculated SDD.
In the example, SDD was approximately 6 kg.
All observations in the study fell within the 6 kg SDD.
The 3RM test (standard error of measurement = 2.8 kg) was considered repeatable/reliable.

SDD is generally an arbitrary number based on a formula.
The 6 kg SDD in the example may not be meaningful for all populations.
Consider practitioner experience and athlete-specific factors when interpreting SDD.
Statistical methods are important for determining measurement error, but experience is crucial for determining what a real change in performance means.

Analytical goals: What is the test being used for?
Elite athletes (close to adaptive ceiling): Require highly accurate tests with low measurement error to detect small but meaningful changes.
Lower-level athletes: Higher level of measurement error may be acceptable; motor learning patterns are less developed, and adaptation is rapid.

Do not rely solely on statistics (e.g., SWC).
Consider the context of the athlete(s).
Decide, based on experience and the specific situation, whether the statistical outcome applies.
What level of measurement error (e.g., coefficient of variation, typical error) is acceptable?
Consult with colleagues to determine what constitutes a practical change.
Even with SDD, determine if the calculated difference is meaningful for the specific group of athletes.