Hypothesis Testing & Clinical Statistics

Hypothesis testing and related concepts (quick reference)

  • Research hypothesis: the investigator’s expectation about the study outcome, based on theory, prior results, and clinical experience.

  • Hypothesis testing: a formal procedure to evaluate hypotheses using sample statistics to determine if results are significantly different from what theory predicts.

Hypotheses and significance

  • Null hypothesis: H0H_0 = no difference / no effect.

  • Alternative hypothesis: HaH_a = there is a difference / an effect.

  • Level of significance: the probability threshold for Type I error, set before analysis. Usual levels: p<0.05 or p<0.01.

  • p-value: the smallest significance level α\alpha

  • Interpretation: if the p-value is 0.030.03, there is a 3% chance of observing the result (or more extreme) if H0H_0 is true. By convention, p-value < 0.05 is statistically significant.

One- vs two-tailed tests

  • One-tailed test: reject H0H_0 if the statistic falls in one tail (left or right).

  • Two-tailed test: reject if the statistic falls in either tail; more conservative and typically reported.

  • Visual: not knowing the direction leads to a two-tailed test.

Errors in hypothesis testing

  • Type I error: reject when it is true (false positive).

  • Type II error: fail to reject when it is false (false negative).

  • Relationship: decreasing alpha (risk of Type I error) increases the risk of Type II error.

  • Type II error is related to power; often influenced by sample size and bias.

Power and sample size

  • Power = 1β1-\beta: probability of correctly rejecting a false H0H_0.

  • If β=0.20\beta=0.20, power = 0.800.80 (80% power).

  • Power analysis (a priori) is used to estimate required sample size before a study, given:

    • desired level of significance α\alpha,

    • expected effect size,

    • desired power (usually 80–90%),

    • variability in the outcome.

Effect size

  • Effect size measures the strength of a finding, independent of sample size.

  • Common measures:

    • Cohen’s d (for mean differences): typically interpreted as small = 0.2, medium = 0.5, large = 0.8.

    • Other measures: partial eta-squared, odds ratios, etc.

  • P value vs effect size: p-value indicates existence of an effect; effect size indicates magnitude.

Confidence intervals (CIs) and standard error

  • Standard Error of measurement (SEm): spread of measured scores around a true score; accompanies CIs.

  • CI formulas (assuming normal approximation):

    • 68% CI = Score ± SEmSEm

    • 95% CI = Score ± 1.96SEm1.96\cdot SEm

    • 99% CI = Score ± 2.58SEm2.58\cdot SEm

  • Interpretation: CI indicates precision of the estimate; smaller CI = more reliable.

  • For mean differences or effects, if the CI includes 0, the result is not statistically significant at that level.

  • For ratios (e.g., Odds Ratio, Risk Ratio), CI should not include 1 to be statistically significant at the chosen level.

Randomised controlled trials (RCTs)

  • Key features:

    • Two or more study groups and prospective outcomes.

    • Random allocation and allocation concealment.

    • Blinding where possible (participants, assessors).

    • Intention-to-treat analysis is common to preserve randomization.

  • Purpose: assess causal effects with minimized bias and higher internal validity; often more costly and less generalizable.

  • Example (HOME trial): randomized two-arm trial comparing predischarge home visits vs in-hospital consultation for older adults; primary outcome included functional independence measures; planned power and sample size discussed.

  • Power and sample size in RCTs: ensure adequate power to detect a meaningful difference; underpowered trials risk Type II errors.

Critical appraisal of research

  • Purpose: assess trustworthiness, applicability, and quality of a study for practice decisions.

  • Common tools: CASP, PeDRo, McMaster’s checklist, among others.

  • Process focuses on methodological quality and applicability to practice.

  • In ALHT211 context: CASP RCT checklist is a primary reference; appraisal questions cover bias, precision, and relevance.

  • Appraisal output: an appraisal summary noting whether the study should influence practice and what resources would be needed to implement.

Clinical significance vs statistical significance

  • Statistical significance: p-value below the chosen threshold indicates an unlikely result under H0H_0.

  • Clinical significance (practical importance): magnitude of effect and relevance to patients/clients, not just statistical significance.

  • A finding can be statistically significant but not clinically meaningful, or vice versa; consider effect size and relevance to practice.

Key definitions recap

  • H0H_0: no difference / no effect.

  • HaH_a: there is a difference / an effect.

  • α\alpha: probability of Type I error (significance level).

  • p-value: probability of observing data as extreme as observed under H0H_0.

  • β\beta: probability of Type II error.

  • Power: 1β1-\beta; probability of detecting a true effect.

  • CI: range around an estimate within which the true population parameter lies with a specified probability.

  • Effect size: magnitude of an effect, independent of sample size.

  • IF CI for mean difference excludes 0 or CI for OR excludes 1, results are statistically significant at the chosen level.