Choosing the Suitable Statistical Test
Non-Experimental (Observational) vs. Experimental Studies
This lecture covers choosing the appropriate statistical test for data analysis.
Non-experimental (Observational) studies are contrasted with experimental studies.
Steps of Statistical Test Selection
After descriptive statistics, the process moves to analytical statistics.
Key analytical skills involve selecting the correct statistical test.
The goal is to answer the research question and decide whether to reject or fail to reject the null hypothesis.
Research Question and Hypothesis Testing
Research starts with an idea, leading to a research question.
A null hypothesis and an alternative hypothesis are formulated.
Sample data is analyzed to obtain a p-value.
If the p-value is less than a predefined alpha (α), the null hypothesis is rejected; otherwise, we fail to reject the null hypothesis.
The proper statistical test is crucial in this process.
Five Steps for Choosing a Statistical Test
1. Bivariate vs. Multivariable Analysis
Question 1: Is it a bivariate or multivariable analysis?
Bivariate analysis: studies the relationship between two variables.
Examples:
Age and height.
Type of treatment and complication.
Sex and smoking.
Smoking and coffee consumption.
Multivariable analysis (regression modeling/analysis):
Studies the effect of multiple variables on an outcome variable.
Examples:
Effect of smoking, sex, coffee consumption on blood pressure.
Effect of smoking, sex, coffee consumption on having a heart attack.
Note: Regression can be used for bivariate analysis if examining the effect of only one variable on the outcome.
2. Difference vs. Correlation (Bivariate Analysis)
Question 2: Are we studying a difference or a correlation (if bivariate)?
Difference: studying the difference between two or more groups or conditions.
Example:
The difference between males and females regarding coffee consumption.
The difference in body weight before and after being on a specific diet.
Correlation: studying the association between two variables.
Examples:
The association between age and weight.
The association between coffee consumption and the number of sleeping hours.
3. Independent vs. Paired Data (Bivariate Analysis)
Question 3: Are we working with independent or paired data (if bivariate)?
Independent (unpaired) data: observations in each sample are unrelated.
No relationship between subjects in each sample.
Subjects in one group cannot be in the other group.
No subject/group can influence the other.
Dependent (Paired) data: paired samples include:
Pre-test/post-test samples (a variable measured before and after an intervention).
Cross-over trials.
Matched samples.
When a variable is measured twice or more on the same individual.
4. Type of Outcome and Normality of Distribution
Question 4: Identify the types of data variables being studied.
The type of data variable is crucial for choosing the suitable test.
Types of Data:
Categorical: No unit.
Nominal: No order (e.g., colors, types of treatment).
Ordinal: Ordered (e.g., pain scale, satisfaction levels).
Numerical: Unit.
Discrete: Counted/integer (e.g., number of children).
Continuous: Measured/decimals (e.g., height, weight).
Time to event data (survival)
Normality of Distribution:
Determine if a numeric variable is normally distributed before certain statistical tests.
A histogram can visually represent the distribution.
5. Number of Groups/Conditions
Question 5: Are we comparing two groups/conditions or more than two?
Examples:
Comparing two groups: diseased vs. not diseased.
Comparing three groups: normal, osteopenia, osteoporosis.
Comparing two conditions: pre-test vs. post-test.
Comparing three conditions: before, during, after the operation.
Guide for Choosing Common Statistical Tests
A table provides guidance based on the answers to the five questions.
Q1: Bivariate/Multivariable.
Q2: Difference/Correlation.
Q3: Independent/Paired.
Q4: Type of outcome (and Normality).
Q5: No. of groups (conditions).
Statistical Tests based on the above questions:
Independent (un-paired):
Difference:
Continuous (Normal), 2 groups: Student's t-test
Continuous (Normal), >2 groups: One-way ANOVA
Continuous (Non-normal)/Ordinal,2 groups : Mann-Whitney U test
Continuous (Non-normal)/Ordinal, >2 groups: Kruskal-Wallis H test
Nominal, 2 groups: Chi-square test/ Fisher's exact test
Nominal, >2 groups: Chi-square test
Time to event (survival): Log-Rank test (Kaplan-Meier plot)
Dependent (paired):
Continuous (Normal), 2 groups: Paired t-test
Continuous (Normal), >2 groups: Repeated measured ANOVA
Continuous (Non-normal)/ Ordinal, 2 groups: Wilcoxon signed-rank test
Continuous (Non-normal)/ Ordinal, >2 groups: Friedman test
Nominal, 2 groups: McNemar's test
Correlation:
Continuous (Normal): Pearson's correlation
Continuous (Non-normal)/Ordinal: Spearman's correlation
Multivariable:
Nominal (2 levels): Spearman/Kappa (Agreement)
Continuous: Linear Regression
Ordinal: Ordered Logistic Regression
Nominal (2 levels): Binary Logistic Regression
Nominal (>2 levels): Multinomial Logistic Regression
Time to Event(survival): Cox Regression
Count variable: Poisson regression
Conclusion
The lecture concludes with acknowledgments to the source material.