Pre and Post Intervention
Parametric Tests
- Statistical tests that make assumptions about population parameters (mean, variance).
- More powerful and precise when assumptions are met.
Assumptions of Parametric Tests
- Normality: Data should follow a normal distribution. Checked using quantile-quantile plots (Q-Q plots). Applies to residuals in regression models.
- Homogeneity of Variance: Variances within groups being compared should be equal. Tested with Levene's test.
- Independence: Observations should be independent; data from different participants shouldn't influence each other.
- If these assumptions aren’t met, non-parametric tests are more appropriate.
Common Parametric Tests in Medical Statistics
- One-sample t-test: Compares the mean of a single group to a known value.
- Independent two-sample t-test: Compares means of two independent groups.
- Paired t-test: Compares means from the same group at different times (e.g., before and after treatment).
- One-way ANOVA: Compares means among three or more independent groups.
- Two-way ANOVA: Compares means among groups split on two independent variables.
- Repeated measures ANOVA: Compares means among groups where the same participants are measured multiple times.
- Mixed-design ANOVA: Combines one-way and repeated measures ANOVA, comparing means among groups with both between-subjects and within-subjects factors.
- Pearson correlation: Measures the strength and direction of the relationship between two continuous variables.
- Linear regression: Models the relationship between a dependent variable and one or more independent variables.
- Multiple regression: Extends linear regression to include multiple independent variables.
Non-Parametric Tests
- Statistical methods that do not assume a specific distribution for the data.
- Useful when parametric test assumptions are violated, or when dealing with ordinal data or small sample sizes.
Assumptions of Non-Parametric Tests
- Independence of Observations: Observations should be independent.
- Ordinal or Continuous Data: Typically used for ordinal (ranked) data or continuous data.
- Random Sampling: Observations should be randomly selected from the population.
Common Non-Parametric Tests in Medical Statistics
- Mann-Whitney U Test: Compares the distributions of two independent groups.
- Wilcoxon Signed-Rank Test: Compares two related/matched samples to assess if population mean ranks differ; requires symmetrical distribution of differences.
- Kruskal-Wallis H Test: Extends Mann-Whitney U test to more than two groups; assumes distributions have the same shape.
- Friedman Test: Compares more than two related groups; non-parametric alternative to repeated measures ANOVA; assumes ordinal or continuous scale and same distribution of ranks.
- Chi-Square Test: Assesses association between categorical variables by comparing observed and expected frequencies; requires expected frequency >= 5 in each cell.
- Spearman's Rank Correlation Coefficient: Measures correlation based on how well a monotonic function describes the relationship between two variables.
- Kendall's Tau: Measures the strength and direction of association between two variables using the ranks of the data.
Key Points for Non-Parametric Tests
- Tests based on medians and ranks.
- Median: The value above and below in which 50% of the data lie.
- Rank: NP methods use the ranks of values rather than the actual values. For example:
- Actual values: 1, 2, 3, 4, 5, 7, 13, 22, 38, 45
- Ranks: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
- Wilcoxon Signed Rank is analogous to a paired t-test.
- Wilcoxon Rank Sum is analogous to an independent t-test.
Recent Neuroscience Research from UCL
- Type of study: Observational Natural History Study (Cohort study)
- Population: Inclusion Body Myositis (IBM)
- Available data:
- 30 patients with Inclusion Body Myositis
- All with 1-year follow-up
- Variables/outcome measures collected
- Gender
- Age
- Age of diseases onset
- Presence of dysphagia (at baseline and follow-up)
- Use of walking aids (at baseline and follow-up)
- Inclusion Body Myositis Functional Rating Scale (IBMFRS, at baseline and follow-up)
Inclusion Body Myositis (IBM)
- No effective therapy currently available.
- Mean time to using a wheelchair: 15 years
- Rare acquired muscle disease characterized by muscle weakness and atrophy that relentlessly progresses to disability
- Degenerative (eosinophilic inclusions, p62 accumulation)
- Other (atrophy, necrosis, fat replacement)
- Mitochondrial (COXneg/SDHpos, ragged red fibers)
- Inflammatory (endomysial infiltrates, MHC-I up-regulation)
- Severe atrophy of quadriceps and forearm flexors
Inclusion Body Myositis Functional Rating Scale (IBMFRS)
- Physical function scale
- 10 items
- Likert-type scale (0-4)
- Range of total score (0-10)
- 0 (worse functional status) to 40 (better functional status)
Manual Muscle Testing (MMT) Grading Scale
- Range: 0 to 5
- 0: None - No visible or palpable contraction
- 1: Trace - Visible or palpable contraction with no motion
- 2: Poor - Full ROM gravity eliminated
- 3: Fair - Full ROM against gravity
- 4: Good - Full ROM against gravity, moderate resistance
- 5: Normal - Full ROM against gravity, maximal resistance
Walking Aids
- 0 represents "No walking aid"
- 1 represents "Stick or Rollator"
- 2 represents "Wheelchair"
Class exercise-1
- Q1. Is patients age continuous or discrete variable ?
- Q2. Is gender binary or ordered nominal variable?
Research Questions
- Paired samples t-test (if parametric data) /Wilcoxon (if non-parametric (NP) data)
- Research question 1: Has there been a significant change in the IBMFRS score over the 1-year follow-up period?
- Independent samples t-test (if parametric data) /Mann–Whitney U test (if NP data)
- Research question 2: Is there a difference in the IBMFRS score between patients with and without dysphagia?
- Chi-square test/Fisher exact test
- Research question 3: Is there any association between patients using walking aids (3 categories) and sex (male & female)?
Research question 1: Comparing two continuous variables
- Research question 1: Has there been significant difference in the IBMFRS score over the 1-year follow-up period?
- Study hypothesis: Hypothesis is there is no difference in follow up score compared to baseline.
- Mathematically, if this is true then \text{mean(diff) = 0}. So the alternative hypothesis is \text{mean(diff)≠ 0}.
- What type of data & tests?
- The IBMFRS measured before follow-up and after follow-up for the same patients. Hence it is paired data.
- First obtain the difference between IBMFRS score in follow-up minus baseline. Then check whether this difference comes from a normal distribution or not.
- If normal choose parametric test, if not use non- parametric test.
- Study hypothesis: Hypothesis is there is no difference in follow up score compared to baseline.
Research question 1: Histogram and Q-Q plot of the difference between IBMFRS score at baseline & follow ups
- How to check normality: plotting histogram and q-q plot.
- if normal use parametric paired t-test if not use non- parametric Wilcoxon signed rank test
- For histogram of the difference (say diff) looks bell shaped hence normal which also suggests by a q-q plot
- In q-q plot if all or most of the points fall near or on straight line then normal
- Histogram
- q-q plot
Stata hist IBMFRS_baseline qnorm IBMFRS_baseline hist IBMFRS_1year qnorm IBMFRS_baseline
- Or Try generating a variable called diff by
- ```Stata
gen diff = IBMFRS1year - IBMFRSbaseline
hist diff
qnorm diff
- ```Stata
### Research question 1: Tests for the difference between IBMFRS score at baseline & follow ups
* Choose & run appropriate test:
* The difference between IBFRS score at baseline & follow up are approximately normally distributed so we use paired t-test.
* ```Stata
ttest IBMFRS_1year==IBMFRS_baseline
- The output is below:
- Mean difference between baseline & follow-up
- 95% confidence interval for the Mean difference
- Test statistic , t = \frac{Mean}{Std.Err.(Mean)} = -2 ÷ 0.66436
- Degreed of freedom = number of paired observation – 1 = 30-1
- P-value while the null hypothesis is mean difference = 0
Research question: 1 Tests for the difference between IBMFRS score at baseline & follow up: conclusion
- Overall the IBMRF score decreased by 2 units in follow up compared to baseline.
- The 95% confidence interval excludes 0 and the p-value is < 0.05, which confirms that this difference is statistically significant.
Research question 2 : Statistics – step by step guide- Comparing two continuous variables
- Research question 2: Is there a difference in the IBMFRS score between patients with and without dysphagia?
- Study hypothesis: Hypothesis is there is no difference in mean of baseline IBMFRS score between patients with and without dysphagia?
- What type of data & tests?
- The IBMFRS was measured in two subgroups of patients (with dysphagia and without dysphagia). Hence we aim to compare unpaired and two independent samples.
- First check if the IBMFRS score at baseline is normally distributed or not.
Research question 2 : Histogram and Q-Q plot of IBMFRS score at baseline
- How to check normality: plotting histogram and q-q plot.
- ```Stata
qnorm IBMFRS_baseline
- ```Stata
* Q-Q plot
* ```Stata
hist IBMFRS_baseline, normal
* Histogram
* Histogram looks like normal and almost all data points in q-q plots are close to the straight line so we can use the two independent sample t- test.
Research question 2 : Tests for the difference in the baseline IBMFRS score between patients with and without dysphagia
- Choose & run the appropriate test:
- The IBMFRS score at baseline is approximately normally distributed so we use the two sample t-test.
- ```Stata
ttest IBMFRSbaseline, by(DysphagiaBL)
* The output is below:
* Understand the STATA output and conclude:
* The mean difference in IBMFRS score between patients with and without dysphagia is 3.6 with 95% confidence interval (CI) from -1.62 to 8.82.
* 95% CI includes '0' and p-value > 0.05 which confirms that there is no statistical significant difference in IBMFRS score between patients with and without dysphagia.
### Class exercise -2
* True or False
1. For a group of patients blood glucose level measured before and after their dinner is an unpaired dataset .
2. Amount of hours sleeping counted in two groups of patients (such as Stroke and Dementia) is unpaired dataset.
### Non-parametric test- example
* Using the same data
### Research question 3: Is there any association between patients using walking aids (3 categories) and sex (male & female)?
* Null hypothesis: Is there NO association between patients using walking aids (3 categories) and sex (male & female)?
* What type of data ?
* Gender : 0 – male , 1 – female
* Recall walking aid use:
* 0 represents “No walking aid”
* 1 represents “Stick or Rollator”
* 2 represents “Wheelchair”
### Research question 3: Statistics – step by step guide- Comparing two proportions
* Setting up & run appropriate tests:
* Try presenting these two variables in a 2 by 2 table as below :
* Null hypothesis : The proportion of female patients using walking aids is not higher than that of their male counterparts.
* Tabulate gender and dysphagia status.
* ```Stata
Try tab Gender Dysphagia_BL
Research question 3: Comparing categorical data 2 by 3
- Setting up & run appropriate tests:
- Check expected frequency .
- ```Stata
Try : tab Gender Walkingaid_BL, expected
* Decision Rule
* If at least one of the expected frequencies are < 5 use fisher exact test
* If all the expected frequencies >= 5 then use chi square test
### Research question 3 : Comparing categorical data : 2 by 3 category
* Run appropriate tests & conclude
* ```Stata
tab Gender Walkingaid_BL, exact
- Conclusions: There is no difference in the proportion between males and females in regards to using walking aids because the p-value from Fisher’s exact test is = 0.85 (>0.05).
Class exercise-3
- Q1. When to use fisher exact test?
- Q2. When to use chi square test?
Class exercise-3
- Q1. When to use fisher exact test?
- Ans: When expected frequency for at least one cell < 5
- Q2. When to use chi square test?
- Ans : When the expected frequency for all cells are >= 5
More examples of non-parametric tests Lumbar Spinal Stenosis (LSS) Data
- What is LSS : Narrowing of the spinal canal causing compression of the nerve roots
- What causes LSS This narrowing happens due to a combination of degenerative changes. (Ageing of the spine).
- Who gets this disease: Prevalent in people aged over 50.
- Consequences of LSS : Leads to symptoms of back pain, leg pain and reduced walking distance
- Research question Randomized Control trial comparing two operations for treatment of LSS – Laminectomy Vs X-Stop device
- To answer the question: Which operation results better quality of life?
- Methods of operations:
- ► Laminectomy – The gold standard operation where the nerve roots are decompressed.
- ► X-Stop – A new minimally invasive device inserted between the spinous processes of the vertebrae.
Outcomes of interest / Study objectives
- Which operation leads to a better quality of life – using EQ5D as a primary outcome measure
- Change in EQ5D response:
- Q1 – change (improvement or deterioration) in EQ5D for the same patient – before and after surgery (Paired data)
- Q2- any difference in outcome between the two surgeries (L vs X) (unpaired data)
Statistics step by step guide: Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)
- Step 1: State statistical hypothesis
- There is no difference in pre-operation & post-operative EQ5D score. The same patient score is measured twice so the data is paired data.
- If the mean difference for EQ5D score before and after the operation is ‘0’ then it goes in favour of the null hypothesis.
- Step – 2: Assumptions
- Histogram showed the distribution of EQ5d before operation and two years after are not normal so nonparametric tests are preferred.
Statistics step by step guide: Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)
- Step: 3 Decide statistical methods:
- We took the difference between pre and post-score and ran non-parametric tests on this difference which is the Wilcoxon signed-rank test (an alternative to paired t-test).
- ```Stata
Genearate a new variable called ‘diff’ which represent post-pre score
gen diff = EQ5D24mts - EQ5DPre
Draw hiatogram
hist diff
- ```Stata
- We took the difference between pre and post-score and ran non-parametric tests on this difference which is the Wilcoxon signed-rank test (an alternative to paired t-test).
* To check normality create a Q-Q plot
* ```Stata
qnorm diff
* Decide an appropriate statistical test
* ```Stata
signrank diff = 0
* The output is below:
### Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)
* For a 95% confidence interval for the difference use the following commands:
### Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)
* Step 4: Understand STATA output and conclude
* The test gives a p-value of 0.002 suggesting that there is enough evidence of a difference in EQ5D scores between before and after operations.
* The 95% confidence around the median is 0.104 to 0.352. This confidence interval is narrow and excludes 0, which indicates a strong precision of the median value.
### Statistics step-by-step guide: Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)
* Step 1: State statistical hypothesis
* There is no difference in mean or median EQ5D score at 24 months between two methods of operations.
* Step – 2: Justify assumptions
* Histogram & q-q plot showed the distribution of EQ5d at 24 months for all patients are not normal so nonparametric tests are preferred.
* ```Stata
For histogram : hist EQ5D24mts, normal
For q-q plot: qnorm EQ5D24mts
Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)
- Step: 3 Decide & run appropriate statistical tests:
- Most of the data points in q-q plots are away from the straight line so we should use a non- parametric Mann-Whitney test (an alternative to two-sample t-tests).
Stata Try: ranksum EQ5D24mts, by(Operation)- The output is below:
Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)
- Step 4: Understand STATA output and conclude
- The output gives us a handy table displaying the two groups, their Obs (number of observations), the observed ranked sums and the rank sum that would be expected if the null hypothesis were retained (if there were no difference).
- Tied ranks can be an issue, so below the table, there is a variance adjustment to account for these ties.
- Then you are reminded of the null hypothesis, and given the z- statistic (0.025) and p-value (0.9797); which suggests that there is no difference in EQ5D score between the two operational methods.
Take home: What statistical methods should I use to analyse my data?
- Choose appropriate statistical methods/tests
| Parametric Tests (variables are from normal dist) | Non-parametric Tests (variables may not from normal dist) |
|---|---|
| Single sample t-test | Wilcoxon-signed rank test |
| Paired sample t-test | Paired Wilcoxon-signed rank |
| 2 independent samples t-test | Mann-Whitney test(Note: sometimes called Wilcoxon Rank Sums test!) |
| One-way Analysis of Variance | Kruskal-Wallis |
| Pearson's correlation | Spearman Rank |
| Repeated Measures | Friedman |
Homework
- IBM data is uploaded in moodle
- Download this data in STATA
- Practice research question 1 to 3 discussed in this lecture.
- LSS data is uploaded in moodle
- Download this data in STATA
- Practice research questions 1 to 2 are discussed in this lecture.
Suggested Reading
- An introduction to medical statistics by Martin Bland (4 th edition) : pages 131-141. page 193-210.
- Medical Statistics by B. Kirkwood & J. Sterne: pages 165-175
- Practical Statistics for Medical Research by Douglas Altman: pages 189-198 & 235-260