Null Hypothesis Testing and Statistical Inference
Null Hypothesis Tests
- Null hypothesis tests are used to draw conclusions on the population based on sample data.
- The choice of test depends on:
- Number of groups being assessed.
- Underlying distribution of the data.
Distribution of Data
- Many common null hypothesis tests require data to be normally distributed or approximately normally distributed.
- A normal distribution is a bell curve where two standard deviations from the mean account for 96% of observations.
- Some advanced tests require the variance of data sets to be homogenous, meaning the underlying structure is similar across test scores.
- If data is not normally distributed (e.g., skewed data), the reliability and validity of inferences can be impacted.
Comparing Two Groups: T-tests
- T-tests are commonly used for basic-level statistical analysis when comparing two groups.
- Types of t-tests:
- Independent t-test:
- Used to assess the difference in test scores of two groups that are stratified non-quantitatively.
- Example: comparing selected vs. non-selected athletes based on competition status.
- Dependent t-test (paired t-test):
- Used to determine whether a difference exists between two dependent groups.
- Observations are contributed from the same participant in each group or testing time point.
- Example: assessing jump height in the same participants over two different testing sessions (T1 and T2).
Multiple Time Points: F-tests (ANOVA)
- When there are multiple testing time points, such as in longitudinal designs, F-tests (ANOVA) are used.
- Types of ANOVA:
- One-way ANOVA:
- Used for independent groups with one grouping factor.
- Example: assessing differences in bench press 1RM between three competition levels in Rugby League.
- Factorial ANOVA:
- Used when there is more than one grouping factor.
- Example: grouping based on competition level and selection status within that level.
- One-way repeated measures ANOVA:
- Used for multiple dependent measures, i.e., repeated measurements on the same participant across more than two occasions.
- Factorial (two-way) repeated measures ANOVA:
- Used for more than two groups (e.g., selected vs. not selected) over multiple measurement times.
- Example: start, mid, and end of the season.
Interpreting ANOVA Results & Post Hoc Comparisons
- F-tests (ANOVA) indicate if there is a difference between groups but do not specify where that difference lies.
- A main effect for a factor (e.g., competition level) indicates a statistical difference exists somewhere among the groups.
- Post hoc comparisons are used to determine exactly where the differences lie through pairwise comparisons.
- Performing multiple comparisons increases the risk of a Type I error (rejecting the null hypothesis when it's true).
- Corrections for Type I error:
- Bonferroni correction: Takes the p-value and divides it by the number of groups.
- HOME (sequentially rejected Bonferroni): Ranks the order of differences to solve for conservatism.
- Tukey test and Scheffé test.
- The researcher decides which correction is most appropriate for the research design.
Non-Parametric Tests
- Used when data does not fit a normal distribution.
- Alternatives to t-tests:
- Two-sample Mann-Whitney U test: For assessing two independent groups.
- Wilcoxon Signed Rank test: For paired or dependent data in two time points.
- Alternative to one-way ANOVA: Kruskal-Wallis test (for more than two groups).
- For more than two factors and two groups:
- Aligned Rank Transformation: Assesses the interaction effect by transforming the data and ranking medians.
- For one group in a longitudinal within-participant design: Friedman test.
Limitations of Null Hypothesis Tests
- Null hypothesis tests provide a description that there is a difference between groups or time points on some level.
- They are somewhat binary; either there is or isn't a difference based on a statistical outcome.
- Statistical outcomes don't always align with practically relevant outcomes.
- Small changes in performance can have large effects on competition outcomes, which may not be detected statistically without very large data sets.
- Hypothesis tests assess performance change or differences between groups on a group level, not at an individual athlete level.
- They don't provide information about the magnitude of the difference.
- Tests are heavily impacted by the distribution of the data. T-tests are sensitive to non-normally distributed data.
- ANOVA models are somewhat robust to violations of distribution assumptions but are still impacted by large deviations.
- The size of the sample impacts the accuracy of statistical outcomes.
- Small samples may not be representative of the population, affecting the inferences that can be drawn from the tests and especially increasing the risk of type II errors.