In-Depth Notes on Null Hypothesis Significance Testing (NHST)
Introduction to Null Hypothesis Significance Testing (NHST)
Inferential Statistics: NHST is a type of inferential statistics used to draw conclusions about a broader population from a sample.
Challenges: Engaging with statistical material can be challenging but is a common experience among students.
Populations and Samples
Population: The complete set of individuals or observations of interest (e.g., adults in New Zealand, adolescents with depression).
Sampling: Due to practical limitations, data is usually collected from a sample rather than the entire population.
Inference: Conclusions drawn from sample data inherently come with uncertainty, as samples may not perfectly represent the population.
Role of Statistics in Inference
Detective Work Analogy: Making inferences from statistics is likened to detective work, where conclusions are drawn based on available information.
Statistics Calculation: Includes means, differences, correlations, standard deviations, etc.
These statistics serve as estimates for population parameters (e.g., correlation between variables in a population).
Understanding Uncertainty and Errors
Uncertainty in Statistics: Statistics involves acknowledging uncertainty and the potential for making errors.
Types of Errors: A key part of NHST is understanding the likelihood of errors when drawing conclusions.
Historical Context of NHST: Developed through the work of statisticians Jersey Neyman, Egon Pearson, and Ronald Fisher.
Steps in Null Hypothesis Significance Testing (NHST)
Hypotheses Formulation:
Establish competing hypotheses:
Null Hypothesis (H0): Assumes no relationship exists between variables or no difference between means.
Alternative Hypothesis (H1): Suggests some relationship or difference does exist.
Alpha Level (α):
Determine a threshold for significance (commonly α = 0.05), which represents the probability of making a Type I error (false positive).
This level indicates a 5% willingness to mistakenly reject the null hypothesis when it is true.
Data Collection:
Collect sample data to estimate the statistic of interest (mean differences, correlations, etc.).
Test Statistic Calculation:
Compute a test statistic that incorporates the size of the observed relationship, sample size, and variability.
Historically done manually, but now often calculated with statistical software.
P-Value Calculation:
The p-value indicates the probability of observing a test statistic at least as extreme as the one calculated if the null hypothesis is true.
Common Misconceptions about p-values:
It is not the probability that the null hypothesis is true.
Reflects how unlikely the observed data is under the H0 assumption.
Determine Statistical Significance:
Compare the p-value to the alpha level:
Statistically Significant: If p < α, reject H0 and accept H1.
Not Statistically Significant: If p ≥ α, fail to reject H0.
Example: If the p-value for a correlation of 0.32 is 0.00003, this indicates statistical significance since 0.00003 < 0.05.
Results Interpretation and Misinterpretations
Rejecting H0: Just because a significant relationship is found doesn’t guarantee the null hypothesis is definitively false; uncertainty remains.
Practical vs. Statistical Significance: A statistically significant result may not be practically significant (large enough to have real-world importance).
Causal Relationships: Statistical conclusions do not alone establish causal connections; study design and context are crucial.
Errors in Hypothesis Testing
Type I Error (False Positive): Rejecting H0 when it is actually true.
Type II Error (False Negative): Failing to reject H0 when the alternative hypothesis is true.
Error Rates:
Type I error probability is controlled by α level.
Type II error probability is influenced by:
Size of the true relationship in the population.
Sample size (larger sizes typically reduce Type II errors).
Variability of data (more variability increases Type II error risk).
Statistical Power Analysis
Statistical Power: Probability of correctly rejecting H0; 1 - probability of Type II error.
Power Analysis Importance: Essential in study design to ensure sufficient sample size to detect significant effects; prevents wasting resources on underpowered studies.
Limitations of NHST
P-Value Misinterpretation: Vulnerable to being misunderstood, leading to invalid conclusions.
Binary Decision Making: NHST often treats evidence as binary (significant vs. non-significant), not reflecting the spectrum of evidence.
Complexity of P-Values: Definitions and implications of p-values are not intuitive; may obscure understanding.
Practical Significance: Just because a result is statistically significant does not guarantee practical relevance.
Conclusion
Utility of NHST: While NHST has limitations, it remains a crucial method in psychology for controlling error probabilities and making inferences about populations from samples. Awareness of its strengths and weaknesses is essential for researchers and practitioners to evaluate statistical findings appropriately.