Null Hypothesis Significance Testing
Forming a research question
- confirmatory research - focus on testing specific hypotheses
- exploratory research - doesn’t necessarily start with a hypothesis
good research question:
- researchable and realistic
- informed by the prior research (gap in the literature/need for replication)
- not too broad/narrow
- allows us to generate testable predictions (hypotheses) that are relevant to our research aims
Moving from research question to hypotheses
what is a hypothesis?
- a testable prediction - a statement about what we reasonably believe our data will show
- the prediction is based on some prior info (literature + research)
conceptual hypothesis
- describes the prediction in conceptual terms
- can be defined in terms of the direction of the effect (direction vs non-directional)
operational hypothesis
- operationalisation - the process of translating the concepts into measures
null vs alternative hypotheses
- alternative hypothesis - prediction that there will be a significant difference
- null hypothesis - negation of the prediction we’re making; there will not be a significant difference
* null hypothesis testing - trying to decide whether we can reject the null hypothesis
Formally testing hypotheses with statistics
decide on the α (alpha) level
- α level - the level of false-positive findings that we’re willing to accept if the null hypothesis is true
* how often are we willing to incorrectly reject the null hypothesis? (i.e. Type 1 error) should be based on cost-benefit analysis - how risky is it to be wrong?
* often use an α rate of 5% (0.05)
calculate the test statistic
- test statistic - a numeric value that we use to test the hypothesis
- different test statistics for different situations (e.g. t, F, χ2, MDiff)
compute the p-value
- the probability of observing the test statistic at least as large as the one we observed if the null hypothesis is true
accept or reject the null hypothesis
- to reject the null hypothesis, the probability of obtaining the test statistic needs to be less than the critical α
Pitfalls of null hypothesis significance testing
- doesn’t tell us the probability of the null or alternative hypotheses being true
- dichotomous thinking around the α level - can encourage questionable research practices (can eventually find a statistically significant result somewhere)
- *p-*values are sensitive to sample size
Ways to avoid pitfalls
- effect sizes - how large/meaningful is the effect
- confidence intervals - what are the plausible limits of our effect?
- preregistration - specifying our analytic plan before we look at the data
- registered reports - scientific paper can get accepted before data collection based on proposed methodology instead of results
- power analysis - determining the necessary sample size before we begin data collection
Power analysis
- power analysis - determining the necessary sample size before we begin data collection
* makes non-significant effects easier to interpret
* reduces the probability of missing an important finding (false-negative/Type 2 error)
* reduces over-sampling + wasting resources
* discourages ‘data-peeking’
* generally aim for a statistical power of 80% - statistical power - the probability of detecting an effect of a certain size as statistically significant (assuming the effect exists)
* the greater the effect of interest, the fewer participants needed