Hypothesis Testing & Clinical Statistics

Hypothesis testing and related concepts (quick reference)

Research hypothesis: the investigator’s expectation about the study outcome, based on theory, prior results, and clinical experience.
Hypothesis testing: a formal procedure to evaluate hypotheses using sample statistics to determine if results are significantly different from what theory predicts.

Hypotheses and significance

Null hypothesis: $H_0$ = no difference / no effect.
Alternative hypothesis: $H_a$ = there is a difference / an effect.
Level of significance: the probability threshold for Type I error, set before analysis. Usual levels: p<0.05 or p<0.01.
p-value: the smallest significance level $\alpha$
Interpretation: if the p-value is $0.03$ , there is a 3% chance of observing the result (or more extreme) if $H_0$ is true. By convention, p-value < 0.05 is statistically significant.

One- vs two-tailed tests

One-tailed test: reject $H_0$ if the statistic falls in one tail (left or right).
Two-tailed test: reject if the statistic falls in either tail; more conservative and typically reported.
Visual: not knowing the direction leads to a two-tailed test.

Errors in hypothesis testing

Type I error: reject when it is true (false positive).
Type II error: fail to reject when it is false (false negative).
Relationship: decreasing alpha (risk of Type I error) increases the risk of Type II error.
Type II error is related to power; often influenced by sample size and bias.

Power and sample size

Power = $1-\beta$ : probability of correctly rejecting a false $H_0$ .
If $\beta=0.20$ , power = $0.80$ (80% power).
Power analysis (a priori) is used to estimate required sample size before a study, given:
- desired level of significance $\alpha$ ,
- expected effect size,
- desired power (usually 80–90%),
- variability in the outcome.

Effect size

Effect size measures the strength of a finding, independent of sample size.
Common measures:
- Cohen’s d (for mean differences): typically interpreted as small = 0.2, medium = 0.5, large = 0.8.
- Other measures: partial eta-squared, odds ratios, etc.
P value vs effect size: p-value indicates existence of an effect; effect size indicates magnitude.

Confidence intervals (CIs) and standard error

Standard Error of measurement (SEm): spread of measured scores around a true score; accompanies CIs.
CI formulas (assuming normal approximation):
- 68% CI = Score ± $SEm$
- 95% CI = Score ± $1.96\cdot SEm$
- 99% CI = Score ± $2.58\cdot SEm$
Interpretation: CI indicates precision of the estimate; smaller CI = more reliable.
For mean differences or effects, if the CI includes 0, the result is not statistically significant at that level.
For ratios (e.g., Odds Ratio, Risk Ratio), CI should not include 1 to be statistically significant at the chosen level.

Randomised controlled trials (RCTs)

Key features:
- Two or more study groups and prospective outcomes.
- Random allocation and allocation concealment.
- Blinding where possible (participants, assessors).
- Intention-to-treat analysis is common to preserve randomization.
Purpose: assess causal effects with minimized bias and higher internal validity; often more costly and less generalizable.
Example (HOME trial): randomized two-arm trial comparing predischarge home visits vs in-hospital consultation for older adults; primary outcome included functional independence measures; planned power and sample size discussed.
Power and sample size in RCTs: ensure adequate power to detect a meaningful difference; underpowered trials risk Type II errors.

Critical appraisal of research

Purpose: assess trustworthiness, applicability, and quality of a study for practice decisions.
Common tools: CASP, PeDRo, McMaster’s checklist, among others.
Process focuses on methodological quality and applicability to practice.
In ALHT211 context: CASP RCT checklist is a primary reference; appraisal questions cover bias, precision, and relevance.
Appraisal output: an appraisal summary noting whether the study should influence practice and what resources would be needed to implement.

Clinical significance vs statistical significance

Statistical significance: p-value below the chosen threshold indicates an unlikely result under $H_0$ .
Clinical significance (practical importance): magnitude of effect and relevance to patients/clients, not just statistical significance.
A finding can be statistically significant but not clinically meaningful, or vice versa; consider effect size and relevance to practice.

Key definitions recap

$H_0$ : no difference / no effect.
$H_a$ : there is a difference / an effect.
$\alpha$ : probability of Type I error (significance level).
p-value: probability of observing data as extreme as observed under $H_0$ .
$\beta$ : probability of Type II error.
Power: $1-\beta$ ; probability of detecting a true effect.
CI: range around an estimate within which the true population parameter lies with a specified probability.
Effect size: magnitude of an effect, independent of sample size.
IF CI for mean difference excludes 0 or CI for OR excludes 1, results are statistically significant at the chosen level.