Null Hypothesis Testing and Statistical Inference

Null Hypothesis Tests

  • Null hypothesis tests are used to draw conclusions on the population based on sample data.
  • The choice of test depends on:
    • Number of groups being assessed.
    • Underlying distribution of the data.

Distribution of Data

  • Many common null hypothesis tests require data to be normally distributed or approximately normally distributed.
  • A normal distribution is a bell curve where two standard deviations from the mean account for 96% of observations.
  • Some advanced tests require the variance of data sets to be homogenous, meaning the underlying structure is similar across test scores.
  • If data is not normally distributed (e.g., skewed data), the reliability and validity of inferences can be impacted.

Comparing Two Groups: T-tests

  • T-tests are commonly used for basic-level statistical analysis when comparing two groups.
  • Types of t-tests:
    • Independent t-test:
      • Used to assess the difference in test scores of two groups that are stratified non-quantitatively.
      • Example: comparing selected vs. non-selected athletes based on competition status.
    • Dependent t-test (paired t-test):
      • Used to determine whether a difference exists between two dependent groups.
      • Observations are contributed from the same participant in each group or testing time point.
      • Example: assessing jump height in the same participants over two different testing sessions (T1 and T2).

Multiple Time Points: F-tests (ANOVA)

  • When there are multiple testing time points, such as in longitudinal designs, F-tests (ANOVA) are used.
  • Types of ANOVA:
    • One-way ANOVA:
      • Used for independent groups with one grouping factor.
      • Example: assessing differences in bench press 1RM between three competition levels in Rugby League.
    • Factorial ANOVA:
      • Used when there is more than one grouping factor.
      • Example: grouping based on competition level and selection status within that level.
    • One-way repeated measures ANOVA:
      • Used for multiple dependent measures, i.e., repeated measurements on the same participant across more than two occasions.
    • Factorial (two-way) repeated measures ANOVA:
      • Used for more than two groups (e.g., selected vs. not selected) over multiple measurement times.
      • Example: start, mid, and end of the season.

Interpreting ANOVA Results & Post Hoc Comparisons

  • F-tests (ANOVA) indicate if there is a difference between groups but do not specify where that difference lies.
  • A main effect for a factor (e.g., competition level) indicates a statistical difference exists somewhere among the groups.
  • Post hoc comparisons are used to determine exactly where the differences lie through pairwise comparisons.
  • Performing multiple comparisons increases the risk of a Type I error (rejecting the null hypothesis when it's true).
  • Corrections for Type I error:
    • Bonferroni correction: Takes the p-value and divides it by the number of groups.
    • HOME (sequentially rejected Bonferroni): Ranks the order of differences to solve for conservatism.
    • Tukey test and Scheffé test.
    • The researcher decides which correction is most appropriate for the research design.

Non-Parametric Tests

  • Used when data does not fit a normal distribution.
  • Alternatives to t-tests:
    • Two-sample Mann-Whitney U test: For assessing two independent groups.
    • Wilcoxon Signed Rank test: For paired or dependent data in two time points.
  • Alternative to one-way ANOVA: Kruskal-Wallis test (for more than two groups).
  • For more than two factors and two groups:
    • Aligned Rank Transformation: Assesses the interaction effect by transforming the data and ranking medians.
  • For one group in a longitudinal within-participant design: Friedman test.

Limitations of Null Hypothesis Tests

  • Null hypothesis tests provide a description that there is a difference between groups or time points on some level.
  • They are somewhat binary; either there is or isn't a difference based on a statistical outcome.
  • Statistical outcomes don't always align with practically relevant outcomes.
  • Small changes in performance can have large effects on competition outcomes, which may not be detected statistically without very large data sets.
  • Hypothesis tests assess performance change or differences between groups on a group level, not at an individual athlete level.
  • They don't provide information about the magnitude of the difference.
  • Tests are heavily impacted by the distribution of the data. T-tests are sensitive to non-normally distributed data.
  • ANOVA models are somewhat robust to violations of distribution assumptions but are still impacted by large deviations.
  • The size of the sample impacts the accuracy of statistical outcomes.
  • Small samples may not be representative of the population, affecting the inferences that can be drawn from the tests and especially increasing the risk of type II errors.