Hypothesis Testing

Purpose: A formal process to determine if observed differences or relationships between variables are likely due to chance, guided by predefined criteria.
Overall process (6 steps):
1. Develop null and research hypotheses
2. Choose a level of significance
3. Determine which statistical test is appropriate
4. Run analysis to obtain a test statistic and p value
5. Make a decision about rejecting or failing to reject the null hypothesis
6. Make a conclusion
Hypotheses are statements that predict the relationship between variables and are testable predictions of the outcome.
Hypotheses translate the research question into a prediction of the outcome. (Education guess about an outcome)

Null Hypothesis (H0): There is no difference between groups or no relationship between variables.
Research Hypothesis (H1 or Ha): There will be a difference between groups or there will be a relationship between variables.
Directional (one-tailed) vs Non-directional (two-tailed) hypotheses:
- Directional: predicts the direction of the effect (e.g., group A > group B).
- Non-directional: predicts a difference or relationship without specifying direction.
Repetition of the core claim: Hypotheses should be testable and derived from a research question.

Criteria used to determine statistical significance; set before data collection.
Common choice: α = 0.05. always used unless told otherwise.
Type I error (false positive): probability of concluding there is a difference when there really isn’t.
- Formal definition: \alpha = P(\text{reject } H0 \mid H0 \text{ true})
Type II error (false negative): probability of failing to detect a difference when there really is one.
Relationship to significance: α defines the risk of a false positive; a lower α reduces this risk but may reduce power.
Clinical vs statistical significance: statistical significance does not always imply clinical (practical) significance.

Type I error: rejecting a true null hypothesis; known as a false positive.
Type II error: failing to reject a false null hypothesis; known as a false negative.
Power: probability of correctly rejecting a false null hypothesis; \text{Power} = 1 - \beta where \beta = P(\text{fail to reject } H0 \mid H0 \text{ false}).
Significance relates to the probability of Type I error; power relates to the ability to detect true effects.

Key factors in choosing a statistical test:
- Number of variables under study
- Levels of measurement (nominal, ordinal, interval, ratio)
- Assumptions of the test (normality, independence, equal variances, etc.)
Test statistic: a calculated value used to decide whether to reject H0 (e.g., t, z, F, chi-square depending on test type).
p-value: the probability of observing the data, or something more extreme, if H0 is true.
- Formal definition: p\text{-value} = P(\text{difference at least as extreme as observed} \mid H_0)
Example cue: The difference between two groups was statistically significant (p = 0.02).
Decision rule depends on the comparison between p-value and α.

Decisions:
- If the calculated p\text{-value} \lt \alpha, REJECT the null hypothesis.
- If the calculated p\text{-value} \ge \alpha, FAIL TO REJECT the null hypothesis.
Example 1 (illustrative):
- Null: there is no difference in statistics test scores between male and female students.
- Research: there is a difference.
- α = 0.05; p = 0.01 ⇒ 0.01 < 0.05 ⇒ REJECT the null hypothesis.
- Conclusion: There is a statistically significant difference between male and female test scores.
Example 2 (illustrative):
- Null: there is no difference in statistics test scores between male and female students.
- Research: there is a difference.
- α = 0.05; p = 0.08 ⇒ 0.08 > 0.05 ⇒ FAIL TO REJECT the null hypothesis.
- Conclusion: There is NOT a statistically significant difference between male and female test scores. Accepting the Null Hypothosis
Important nuance: Hypotheses are not proven; data can support a hypothesis. Conclusions are about significance, not absolute truth.

Conclusion when H0 is rejected: There IS a difference between groups or a relationship between variables.
Conclusion when H0 is not rejected: There is NOT a difference between groups or a relationship between variables (based on the sample).
Restatement using the same population context helps avoid overgeneralization.
Distinction: Statistical significance does not imply practical or clinical significance.

Scenario A:
- Null: No difference in test scores between male and female students.
- Research: There is a difference.
- α = 0.05; result p = 0.01 → REJECT null; conclude a statistically significant difference.
Scenario B:
- Null: No difference between male and female test scores.
- Research: There is a difference.
- α = 0.05; result p = 0.08 → FAIL TO REJECT null; conclude no statistically significant difference.
Scenario C (Conclusion recap):
- Hypotheses are not proven; data can support a hypothesis.
- Distinguish between statistical significance and clinical significance.

Type I error probability: \alpha = P(\text{reject } H0 \mid H0 \text{ true})
Type II error probability: \beta = P(\text{fail to reject } H0 \mid H0 \text{ false})
Power of a test: \text{Power} = 1 - \beta
p-value concept: p\text{-value} = P(\text{difference at least as extreme as observed} \mid H_0)
Decision rule (typical): If p < \alpha \Rightarrow \text{Reject } H0; if p \ge \alpha \Rightarrow \text{Fail to reject } H0
Common significance example: \alpha = 0.05
Example p-values from slides: p = 0.01,\; p = 0.08,\; p = 0.02 (illustrative reference values)

Relationship to study design: Ensure hypotheses are aligned with research questions and data collection plans.
Foundations: Hypothesis testing builds on probability theory and inferential statistics to make evidence-based conclusions.
Practical implications: Beyond statistical decisions, consider whether findings are clinically or practically meaningful and how they inform evidence-based practice.