Pre and Post Intervention

Parametric Tests

Statistical tests that make assumptions about population parameters (mean, variance).
More powerful and precise when assumptions are met.

Assumptions of Parametric Tests

Normality: Data should follow a normal distribution. Checked using quantile-quantile plots (Q-Q plots). Applies to residuals in regression models.
Homogeneity of Variance: Variances within groups being compared should be equal. Tested with Levene's test.
Independence: Observations should be independent; data from different participants shouldn't influence each other.
If these assumptions aren’t met, non-parametric tests are more appropriate.

Common Parametric Tests in Medical Statistics

One-sample t-test: Compares the mean of a single group to a known value.
Independent two-sample t-test: Compares means of two independent groups.
Paired t-test: Compares means from the same group at different times (e.g., before and after treatment).
One-way ANOVA: Compares means among three or more independent groups.
Two-way ANOVA: Compares means among groups split on two independent variables.
Repeated measures ANOVA: Compares means among groups where the same participants are measured multiple times.
Mixed-design ANOVA: Combines one-way and repeated measures ANOVA, comparing means among groups with both between-subjects and within-subjects factors.
Pearson correlation: Measures the strength and direction of the relationship between two continuous variables.
Linear regression: Models the relationship between a dependent variable and one or more independent variables.
Multiple regression: Extends linear regression to include multiple independent variables.

Non-Parametric Tests

Statistical methods that do not assume a specific distribution for the data.
Useful when parametric test assumptions are violated, or when dealing with ordinal data or small sample sizes.

Assumptions of Non-Parametric Tests

Independence of Observations: Observations should be independent.
Ordinal or Continuous Data: Typically used for ordinal (ranked) data or continuous data.
Random Sampling: Observations should be randomly selected from the population.

Common Non-Parametric Tests in Medical Statistics

Mann-Whitney U Test: Compares the distributions of two independent groups.
Wilcoxon Signed-Rank Test: Compares two related/matched samples to assess if population mean ranks differ; requires symmetrical distribution of differences.
Kruskal-Wallis H Test: Extends Mann-Whitney U test to more than two groups; assumes distributions have the same shape.
Friedman Test: Compares more than two related groups; non-parametric alternative to repeated measures ANOVA; assumes ordinal or continuous scale and same distribution of ranks.
Chi-Square Test: Assesses association between categorical variables by comparing observed and expected frequencies; requires expected frequency >= 5 in each cell.
Spearman's Rank Correlation Coefficient: Measures correlation based on how well a monotonic function describes the relationship between two variables.
Kendall's Tau: Measures the strength and direction of association between two variables using the ranks of the data.

Key Points for Non-Parametric Tests

Tests based on medians and ranks.
Median: The value above and below in which 50% of the data lie.
Rank: NP methods use the ranks of values rather than the actual values. For example:
- Actual values: 1, 2, 3, 4, 5, 7, 13, 22, 38, 45
- Ranks: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Wilcoxon Signed Rank is analogous to a paired t-test.
Wilcoxon Rank Sum is analogous to an independent t-test.

Recent Neuroscience Research from UCL

Type of study: Observational Natural History Study (Cohort study)
Population: Inclusion Body Myositis (IBM)
Available data:
- 30 patients with Inclusion Body Myositis
- All with 1-year follow-up
- Variables/outcome measures collected
  - Gender
  - Age
  - Age of diseases onset
  - Presence of dysphagia (at baseline and follow-up)
  - Use of walking aids (at baseline and follow-up)
  - Inclusion Body Myositis Functional Rating Scale (IBMFRS, at baseline and follow-up)

Inclusion Body Myositis (IBM)

No effective therapy currently available.
Mean time to using a wheelchair: 15 years
Rare acquired muscle disease characterized by muscle weakness and atrophy that relentlessly progresses to disability
Degenerative (eosinophilic inclusions, p62 accumulation)
Other (atrophy, necrosis, fat replacement)
Mitochondrial (COXneg/SDHpos, ragged red fibers)
Inflammatory (endomysial infiltrates, MHC-I up-regulation)
Severe atrophy of quadriceps and forearm flexors

Inclusion Body Myositis Functional Rating Scale (IBMFRS)

Physical function scale
10 items
Likert-type scale (0-4)
Range of total score (0-10)
- 0 (worse functional status) to 40 (better functional status)

Manual Muscle Testing (MMT) Grading Scale

Range: 0 to 5
- 0: None - No visible or palpable contraction
- 1: Trace - Visible or palpable contraction with no motion
- 2: Poor - Full ROM gravity eliminated
- 3: Fair - Full ROM against gravity
- 4: Good - Full ROM against gravity, moderate resistance
- 5: Normal - Full ROM against gravity, maximal resistance

Walking Aids

0 represents "No walking aid"
1 represents "Stick or Rollator"
2 represents "Wheelchair"

Class exercise-1

Q1. Is patients age continuous or discrete variable ?
Q2. Is gender binary or ordered nominal variable?

Research Questions

Paired samples t-test (if parametric data) /Wilcoxon (if non-parametric (NP) data)
- Research question 1: Has there been a significant change in the IBMFRS score over the 1-year follow-up period?
Independent samples t-test (if parametric data) /Mann–Whitney U test (if NP data)
- Research question 2: Is there a difference in the IBMFRS score between patients with and without dysphagia?
Chi-square test/Fisher exact test
- Research question 3: Is there any association between patients using walking aids (3 categories) and sex (male & female)?

Research question 1: Comparing two continuous variables

Research question 1: Has there been significant difference in the IBMFRS score over the 1-year follow-up period?
- Study hypothesis: Hypothesis is there is no difference in follow up score compared to baseline.
  - Mathematically, if this is true then \text{mean(diff) = 0}. So the alternative hypothesis is \text{mean(diff)≠ 0}.
- What type of data & tests?
  - The IBMFRS measured before follow-up and after follow-up for the same patients. Hence it is paired data.
  - First obtain the difference between IBMFRS score in follow-up minus baseline. Then check whether this difference comes from a normal distribution or not.
  - If normal choose parametric test, if not use non- parametric test.

Research question 1: Histogram and Q-Q plot of the difference between IBMFRS score at baseline & follow ups

How to check normality: plotting histogram and q-q plot.
- if normal use parametric paired t-test if not use non- parametric Wilcoxon signed rank test
- For histogram of the difference (say diff) looks bell shaped hence normal which also suggests by a q-q plot
- In q-q plot if all or most of the points fall near or on straight line then normal
  - Histogram
  - q-q plot
  - Stata hist IBMFRS_baseline qnorm IBMFRS_baseline hist IBMFRS_1year qnorm IBMFRS_baseline
- Or Try generating a variable called diff by
  - ```Stata
  gen diff = IBMFRS1year - IBMFRSbaseline
  hist diff
  qnorm diff

### Research question 1: Tests for the difference between IBMFRS score at baseline & follow ups

*   Choose & run appropriate test:
    *   The difference between IBFRS score at baseline & follow up are approximately normally distributed so we use paired t-test.
    *   ```Stata
ttest IBMFRS_1year==IBMFRS_baseline

The output is below:
- Mean difference between baseline & follow-up
- 95% confidence interval for the Mean difference
- Test statistic , t = \frac{Mean}{Std.Err.(Mean)} = -2 ÷ 0.66436
- Degreed of freedom = number of paired observation – 1 = 30-1
- P-value while the null hypothesis is mean difference = 0

Research question: 1 Tests for the difference between IBMFRS score at baseline & follow up: conclusion

Overall the IBMRF score decreased by 2 units in follow up compared to baseline.
The 95% confidence interval excludes 0 and the p-value is < 0.05, which confirms that this difference is statistically significant.

Research question 2 : Statistics – step by step guide- Comparing two continuous variables

Research question 2: Is there a difference in the IBMFRS score between patients with and without dysphagia?
- Study hypothesis: Hypothesis is there is no difference in mean of baseline IBMFRS score between patients with and without dysphagia?
- What type of data & tests?
  - The IBMFRS was measured in two subgroups of patients (with dysphagia and without dysphagia). Hence we aim to compare unpaired and two independent samples.
  - First check if the IBMFRS score at baseline is normally distributed or not.

Research question 2 : Histogram and Q-Q plot of IBMFRS score at baseline

How to check normality: plotting histogram and q-q plot.
- ```Stata
qnorm IBMFRS_baseline

        *   Q-Q plot
    *   ```Stata
hist IBMFRS_baseline, normal

    *   Histogram
*   Histogram looks like normal and almost all data points in q-q plots are close to the straight line so we can use the two independent sample t- test.

Research question 2 : Tests for the difference in the baseline IBMFRS score between patients with and without dysphagia

Choose & run the appropriate test:
- The IBMFRS score at baseline is approximately normally distributed so we use the two sample t-test.
- ```Stata
ttest IBMFRSbaseline, by(DysphagiaBL)

*   The output is below:
    *   Understand the STATA output and conclude:
        *   The mean difference in IBMFRS score between patients with and without dysphagia is 3.6 with 95% confidence interval (CI) from -1.62 to 8.82.
        *   95% CI includes '0' and p-value > 0.05 which confirms that there is no statistical significant difference in IBMFRS score between patients with and without dysphagia.

### Class exercise -2

*   True or False
    1.  For a group of patients blood glucose level measured before and after their dinner is an unpaired dataset .
    2.  Amount of hours sleeping counted in two groups of patients (such as Stroke and Dementia) is unpaired dataset.

### Non-parametric test- example

*   Using the same data

### Research question 3: Is there any association between patients using walking aids (3 categories) and sex (male & female)?

*   Null hypothesis: Is there NO association between patients using walking aids (3 categories) and sex (male & female)?
*   What type of data ?
    *   Gender : 0 – male , 1 – female
    *   Recall walking aid use:
        *   0 represents “No walking aid”
        *   1 represents “Stick or Rollator”
        *   2 represents “Wheelchair”

### Research question 3: Statistics – step by step guide- Comparing two proportions

*   Setting up & run appropriate tests:
    *   Try presenting these two variables in a 2 by 2 table as below :
        *   Null hypothesis : The proportion of female patients using walking aids is not higher than that of their male counterparts.
    *   Tabulate gender and dysphagia status.
        *   ```Stata
Try tab Gender Dysphagia_BL

Research question 3: Comparing categorical data 2 by 3

Setting up & run appropriate tests:
- Check expected frequency .
- ```Stata
Try : tab Gender Walkingaid_BL, expected

*   Decision Rule
    *   If at least one of the expected frequencies are < 5 use fisher exact test
    *   If all the expected frequencies >= 5 then use chi square test

### Research question 3 : Comparing categorical data : 2 by 3 category

*   Run appropriate tests & conclude
    *   ```Stata
tab Gender Walkingaid_BL, exact

Conclusions: There is no difference in the proportion between males and females in regards to using walking aids because the p-value from Fisher’s exact test is = 0.85 (>0.05).

Class exercise-3

Q1. When to use fisher exact test?
Q2. When to use chi square test?

Class exercise-3

Q1. When to use fisher exact test?
- Ans: When expected frequency for at least one cell < 5
Q2. When to use chi square test?
- Ans : When the expected frequency for all cells are >= 5

More examples of non-parametric tests Lumbar Spinal Stenosis (LSS) Data

What is LSS : Narrowing of the spinal canal causing compression of the nerve roots
What causes LSS This narrowing happens due to a combination of degenerative changes. (Ageing of the spine).
Who gets this disease: Prevalent in people aged over 50.
Consequences of LSS : Leads to symptoms of back pain, leg pain and reduced walking distance
Research question Randomized Control trial comparing two operations for treatment of LSS – Laminectomy Vs X-Stop device
- To answer the question: Which operation results better quality of life?
Methods of operations:
- ► Laminectomy – The gold standard operation where the nerve roots are decompressed.
- ► X-Stop – A new minimally invasive device inserted between the spinous processes of the vertebrae.

Outcomes of interest / Study objectives

Which operation leads to a better quality of life – using EQ5D as a primary outcome measure
Change in EQ5D response:
- Q1 – change (improvement or deterioration) in EQ5D for the same patient – before and after surgery (Paired data)
- Q2- any difference in outcome between the two surgeries (L vs X) (unpaired data)

Statistics step by step guide: Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)

Step 1: State statistical hypothesis
- There is no difference in pre-operation & post-operative EQ5D score. The same patient score is measured twice so the data is paired data.
- If the mean difference for EQ5D score before and after the operation is ‘0’ then it goes in favour of the null hypothesis.
Step – 2: Assumptions
- Histogram showed the distribution of EQ5d before operation and two years after are not normal so nonparametric tests are preferred.

Statistics step by step guide: Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)

Step: 3 Decide statistical methods:
- We took the difference between pre and post-score and ran non-parametric tests on this difference which is the Wilcoxon signed-rank test (an alternative to paired t-test).
  - ```Stata
  Genearate a new variable called ‘diff’ which represent post-pre score
  gen diff = EQ5D24mts - EQ5DPre
  Draw hiatogram
  hist diff

    *   To check normality create a Q-Q plot
        *   ```Stata
qnorm diff

*   Decide an appropriate statistical test
    *   ```Stata

signrank diff = 0

    *   The output is below:

### Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)

*   For a 95% confidence interval for the difference use the following commands:

### Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)

*   Step 4: Understand STATA output and conclude
    *   The test gives a p-value of 0.002 suggesting that there is enough evidence of a difference in EQ5D scores between before and after operations.
    *   The 95% confidence around the median is 0.104 to 0.352. This confidence interval is narrow and excludes 0, which indicates a strong precision of the median value.

### Statistics step-by-step guide: Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)

*   Step 1: State statistical hypothesis
    *   There is no difference in mean or median EQ5D score at 24 months between two methods of operations.
*   Step – 2: Justify assumptions
    *   Histogram & q-q plot showed the distribution of EQ5d at 24 months for all patients are not normal so nonparametric tests are preferred.
        *   ```Stata
For histogram : hist EQ5D24mts, normal
For q-q plot: qnorm EQ5D24mts

Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)

Step: 3 Decide & run appropriate statistical tests:
- Most of the data points in q-q plots are away from the straight line so we should use a non- parametric Mann-Whitney test (an alternative to two-sample t-tests).
Stata Try: ranksum EQ5D24mts, by(Operation)
The output is below:

Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)

Step 4: Understand STATA output and conclude
- The output gives us a handy table displaying the two groups, their Obs (number of observations), the observed ranked sums and the rank sum that would be expected if the null hypothesis were retained (if there were no difference).
- Tied ranks can be an issue, so below the table, there is a variance adjustment to account for these ties.
- Then you are reminded of the null hypothesis, and given the z- statistic (0.025) and p-value (0.9797); which suggests that there is no difference in EQ5D score between the two operational methods.

Take home: What statistical methods should I use to analyse my data?

Choose appropriate statistical methods/tests

Parametric Tests (variables are from normal dist)	Non-parametric Tests (variables may not from normal dist)
Single sample t-test	Wilcoxon-signed rank test
Paired sample t-test	Paired Wilcoxon-signed rank
2 independent samples t-test	Mann-Whitney test(Note: sometimes called Wilcoxon Rank Sums test!)
One-way Analysis of Variance	Kruskal-Wallis
Pearson's correlation	Spearman Rank
Repeated Measures	Friedman

Homework

IBM data is uploaded in moodle
- Download this data in STATA
- Practice research question 1 to 3 discussed in this lecture.
LSS data is uploaded in moodle
- Download this data in STATA
- Practice research questions 1 to 2 are discussed in this lecture.

Pre and Post Intervention

Parametric Tests

Assumptions of Parametric Tests

Common Parametric Tests in Medical Statistics

Non-Parametric Tests

Assumptions of Non-Parametric Tests

Common Non-Parametric Tests in Medical Statistics

Key Points for Non-Parametric Tests

Recent Neuroscience Research from UCL

Inclusion Body Myositis (IBM)

Inclusion Body Myositis Functional Rating Scale (IBMFRS)

Manual Muscle Testing (MMT) Grading Scale

Walking Aids

Class exercise-1

Research Questions

Research question 1: Comparing two continuous variables

Research question 1: Histogram and Q-Q plot of the difference between IBMFRS score at baseline & follow ups

Research question: 1 Tests for the difference between IBMFRS score at baseline & follow up: conclusion

Research question 2 : Statistics – step by step guide- Comparing two continuous variables

Research question 2 : Histogram and Q-Q plot of IBMFRS score at baseline

Research question 2 : Tests for the difference in the baseline IBMFRS score between patients with and without dysphagia

Research question 3: Comparing categorical data 2 by 3

Class exercise-3

Class exercise-3

More examples of non-parametric tests Lumbar Spinal Stenosis (LSS) Data

Outcomes of interest / Study objectives

Statistics step by step guide: Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)

Statistics step by step guide: Q1. Compare EQ5D : Preoperative Vs. Postoperative : (paired sample)

Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)

Q2. Compare EQ5D: Laminectomy Vs. X-stop (independent sample)

Take home: What statistical methods should I use to analyse my data?

Homework

Suggested Reading