Hypothesis Testing
Hypothesis Testing: Core Concepts
Purpose: A formal process to determine if observed differences or relationships between variables are likely due to chance, guided by predefined criteria.
Overall process (6 steps):
Develop null and research hypotheses
Choose a level of significance
Determine which statistical test is appropriate
Run analysis to obtain a test statistic and p value
Make a decision about rejecting or failing to reject the null hypothesis
Make a conclusion
Hypotheses are statements that predict the relationship between variables and are testable predictions of the outcome.
Hypotheses translate the research question into a prediction of the outcome. (Education guess about an outcome)
Hypotheses: Null and Research
Null Hypothesis (H0): There is no difference between groups or no relationship between variables.
Research Hypothesis (H1 or Ha): There will be a difference between groups or there will be a relationship between variables.
Directional (one-tailed) vs Non-directional (two-tailed) hypotheses:
Directional: predicts the direction of the effect (e.g., group A > group B).
Non-directional: predicts a difference or relationship without specifying direction.
Repetition of the core claim: Hypotheses should be testable and derived from a research question.
Level of Significance (α)
Criteria used to determine statistical significance; set before data collection.
Common choice: α = 0.05. always used unless told otherwise.
Type I error (false positive): probability of concluding there is a difference when there really isn’t.
Formal definition: \alpha = P(\text{reject } H0 \mid H0 \text{ true})
Type II error (false negative): probability of failing to detect a difference when there really is one.
Relationship to significance: α defines the risk of a false positive; a lower α reduces this risk but may reduce power.
Clinical vs statistical significance: statistical significance does not always imply clinical (practical) significance.
Type I and Type II Errors; Significance Concepts
Type I error: rejecting a true null hypothesis; known as a false positive.
Type II error: failing to reject a false null hypothesis; known as a false negative.
Power: probability of correctly rejecting a false null hypothesis; \text{Power} = 1 - \beta where \beta = P(\text{fail to reject } H0 \mid H0 \text{ false}).
Significance relates to the probability of Type I error; power relates to the ability to detect true effects.
Statistical Tests, Test Statistics, and Assumptions
Key factors in choosing a statistical test:
Number of variables under study
Levels of measurement (nominal, ordinal, interval, ratio)
Assumptions of the test (normality, independence, equal variances, etc.)
Test statistic: a calculated value used to decide whether to reject H0 (e.g., t, z, F, chi-square depending on test type).
p-value: the probability of observing the data, or something more extreme, if H0 is true.
Formal definition: p\text{-value} = P(\text{difference at least as extreme as observed} \mid H_0)
Example cue: The difference between two groups was statistically significant (p = 0.02).
Decision rule depends on the comparison between p-value and α.
Decision Rules and Conclusions
Decisions:
If the calculated p\text{-value} \lt \alpha, REJECT the null hypothesis.
If the calculated p\text{-value} \ge \alpha, FAIL TO REJECT the null hypothesis.
Example 1 (illustrative):
Null: there is no difference in statistics test scores between male and female students.
Research: there is a difference.
α = 0.05; p = 0.01 ⇒ 0.01 < 0.05 ⇒ REJECT the null hypothesis.
Conclusion: There is a statistically significant difference between male and female test scores.
Example 2 (illustrative):
Null: there is no difference in statistics test scores between male and female students.
Research: there is a difference.
α = 0.05; p = 0.08 ⇒ 0.08 > 0.05 ⇒ FAIL TO REJECT the null hypothesis.
Conclusion: There is NOT a statistically significant difference between male and female test scores. Accepting the Null Hypothosis
Important nuance: Hypotheses are not proven; data can support a hypothesis. Conclusions are about significance, not absolute truth.
Conclusions in Hypothesis Testing
Conclusion when H0 is rejected: There IS a difference between groups or a relationship between variables.
Conclusion when H0 is not rejected: There is NOT a difference between groups or a relationship between variables (based on the sample).
Restatement using the same population context helps avoid overgeneralization.
Distinction: Statistical significance does not imply practical or clinical significance.
Worked Scenarios (From Slides)
Scenario A:
Null: No difference in test scores between male and female students.
Research: There is a difference.
α = 0.05; result p = 0.01 → REJECT null; conclude a statistically significant difference.
Scenario B:
Null: No difference between male and female test scores.
Research: There is a difference.
α = 0.05; result p = 0.08 → FAIL TO REJECT null; conclude no statistically significant difference.
Scenario C (Conclusion recap):
Hypotheses are not proven; data can support a hypothesis.
Distinguish between statistical significance and clinical significance.
Key Formulas and Concepts (LaTeX)
Type I error probability: \alpha = P(\text{reject } H0 \mid H0 \text{ true})
Type II error probability: \beta = P(\text{fail to reject } H0 \mid H0 \text{ false})
Power of a test: \text{Power} = 1 - \beta
p-value concept: p\text{-value} = P(\text{difference at least as extreme as observed} \mid H_0)
Decision rule (typical): If p < \alpha \Rightarrow \text{Reject } H0; if p \ge \alpha \Rightarrow \text{Fail to reject } H0
Common significance example: \alpha = 0.05
Example p-values from slides: p = 0.01,\; p = 0.08,\; p = 0.02 (illustrative reference values)
Connections to Broader Course Context
Relationship to study design: Ensure hypotheses are aligned with research questions and data collection plans.
Foundations: Hypothesis testing builds on probability theory and inferential statistics to make evidence-based conclusions.
Practical implications: Beyond statistical decisions, consider whether findings are clinically or practically meaningful and how they inform evidence-based practice.