Statistical Significance Testing

Introduction to Statistical Significance Testing

  • Many people, including researchers, misunderstand the function of statistical significance testing.

  • The probability value, or p-value, does not prove causation or practical importance.

  • It does not indicate the size or meaningfulness of the effect of independent variables on dependent variables.

  • The p-value provides the probability of the observed data if the null hypothesis is true.

Objectives of the Lecture

  • The aim is to understand what statistical significance testing truly does and to evaluate statistics critically.

  • This lecture is only the first of two focusing on interpreting data concerning statistical significance testing.

  • The next lecture will cover effect size and confidence intervals.

  • The three metrics are referred to collectively as "the big three."

Understanding Statistical Significance

  • Statistical significance testing was historically regarded as a definitive indicator of meaningful effects.

  • In reality, effect size and confidence intervals are more influential in interpreting data.

Concept of the Null Hypothesis

  • Null Hypothesis (H0): A statement that assumes no effect or no difference exists in the population.

  • Example: People who go swimming never get wet. (Hypothetical)

  • Evidence against the null hypothesis builds confidence in rejecting it when a significant number of samples show contrary evidence.

Analogy: Jury System

  • The null hypothesis can be compared to a jury's presumption of innocence until proven guilty.

    • Evidence Required: Significant evidence, like DNA or video, is needed to reject the null hypothesis.

  • Statistical Significance Testing: Looks for evidence to reject the null hypothesis.

Obtaining Evidence via Sampling

  • Statistical significance testing starts with the sampling process.

  • Random sampling error occurs due to inevitable deviations in sample means compared to the population mean.

    • Most samples will deviate from the population mean to varying degrees.

  • Types of Error:

    • Random Sampling Error: Inherent and measurable.

    • Bias: Caused by poor sampling methods; not measurable.

Measuring Random Sampling Error

  • Random samples lead to a Sampling Distribution:

    • Definition: Distribution of sample statistics (e.g., mean) from all possible random samples of a specified size from the population.

  • The Central Limit Theorem states:

    1. The mean of the sampling distribution (x̄) equals the population mean (µ).

    2. The shape of the sampling distribution becomes normally distributed (bell-shaped) with large enough sample sizes (≥ 30).

    3. Standard deviation of the sampling distribution (standard error) decreases as sample size increases.

Central Limit Theorem and Its Role

  • Purpose: Provides a foundation for statistical significance testing by establishing expected sample distributions under the null hypothesis:

    • Null Sampling Distribution: Represents all possible outcomes if there is no treatment effect.

    • Alternative Sampling Distribution: Represents outcomes if there is a treatment effect.

Power of Statistical Tests

  • Statistical significance testing helps answer: “Did something happen?”

    • It does not convey effect size or confidence in the result's reliability.

  • Type I Error (alpha error): Rejecting the null hypothesis when it is true; typically set at 0.05 (5%).

  • Type II Error (beta error): Failing to reject the null when the alternative hypothesis is true; often set at 0.20 (20%).

  • Statistical Power: Probability of correctly rejecting a false null hypothesis.

Visualizing Errors and Power

  • Type I Error: Believing in significance when in fact, there is none (far right part of the null distribution).

  • Type II Error: Missing a significant effect due to a failure to reject the null hypothesis (at the critical value).

  • Power: Correct decision made by rejecting the null hypothesis when the alternative hypothesis is true.

Critical Value and P-Value

  • The result from statistical tests (e.g., t-test, ANOVA) produces a p-value.

  • To determine significance, compare the p-value to the alpha level (0.05).

    • Interpretation:

    • If p ≤ alpha, reject the null hypothesis.

    • If p > alpha, do not reject the null hypothesis.

Practical Examples with P-Values

  • Example Comparing Cholesterol Drug Efficacy:

    • A p-value of 0.60 indicates insufficient evidence against the null hypothesis that the drug works.

    • A p-value of 0.01 suggests strong evidence against the null hypothesis, indicating effective treatment.

Limitations of Statistical Significance Alone

  • Statistics are susceptible to manipulation based on sample size.

  • Larger sample sizes usually result in smaller standard error, making even minor differences statistically significant yet clinically irrelevant.

  • Power Analysis: Used to determine appropriate sample sizes for achieving statistically meaningful results.

Historical Context of Alpha Levels

  • The common alpha level of 0.05 has historical precedent but is not a stringent standard.

  • Decision-making can vary; for higher stakes, a lower alpha (like 0.01) might be used.

Contextual Understanding of Statistics

  • Interpretations must consider context as statistics can suggest certainty improperly.

  • Just because a result is statistically significant does not imply it is clinically relevant or useful.

  • Informed analysis remains a skilled area requiring expertise, recognized in high-quality journals and practiced by biostatisticians and epidemiologists.