Significance Testing and Hypothesis Testing
Significance Testing
Introduction to statistical hypotheses and significance testing.
Dr. Olusegun Fawole (olusegun.fawole@port.ac.uk).
Statistical Hypotheses
Testing the null hypothesis.
Example: Comparing leaf sizes from the Southwest (SW) and Northwest (NW).
Null Hypothesis (H₀): Leaf size SW < Leaf size NW.
Data samples:
SW: 2.01, 1.35, 4.5, 4.32, 5.34, 1.2
NW: 1.32, 2.63, 3.45, 7.51, 6.35, 6.45
Location and Spread Estimators (Sample):
Mean: \bar{x} = \frac{1}{n} \sum x_i
Variance: s^2
LsD (Leaf size Difference) = \bar{x}{SW} - \bar{x}{NW}
Test Statistics
Definition: A numerical summary that reduces the data to one value and which values we know (distribution) under the null hypothesis.
LsD = \bar{x}{SW} - \bar{x}{NW} = -1.3
Question: How certain are we LsD is indeed negative and we did not get this value just out of chance?
We can find the P(SD=-1.3)!
Remember, we just need to know the area under the SD distribution curve up to that value.
Test Statistics - Process
Estimate all possible values of the statistic. Data samples for SW and NW are repeated with Lsd1, Lsd2 … Lsdn.
Use a statistic with known pdf (statistical distributions).
t: uses the Student t distribution
F: uses the F distribution
Z: uses the Standard normal distribution
\chi^2: uses the chi-square distribution
Mention of Bayesian vs. Frequentist approaches.
Steps to Testing a Hypothesis
Define study question.
Choose a suitable test.
Set null and alternative hypothesis.
Calculate a test statistic.
Calculate a p-value.
Make a decision and interpret your conclusions.
Resource: www.statstutor.ac.uk
Illustration: Titanic Example
The ship Titanic sank in 1912 with significant loss of life.
809 of the 1,309 passengers and crew died (61.8%).
Research question: Did class (of travel) affect survival?
Resource: www.statstutor.ac.uk
Titanic Example: Hypotheses
Null (H₀): There is NO association between class and survival.
Alternative (Hₐ): There IS an association between class and survival.
Expectation if the null hypothesis is true: Same proportion of people would have died in each class!
Hypotheses Testing: Decision Rule
Use statistical tools and software to undertake a hypothesis test.
P-value (P) is a key output.
If P < 0.05, reject H₀ => Evidence of Hₐ being true (i.e., IS association).
If P > 0.05, do not reject H₀ (i.e., NO association).
Resource: www.statstutor.ac.uk
T-tests
Used to compare two population means.
Paired data: same individuals studied at two different times or under two conditions (PAIRED T-TEST).
Independent: data collected from two separate groups (INDEPENDENT SAMPLES T-TEST).
Assumptions in t-test
Normality:
Plot histograms: one plot of the paired differences for any paired data; two (one for each group) for independent samples.
Should be roughly symmetric.
Equal Population variances:
Compare sample standard deviations: one should be no more than twice the other.
Do a formal test for differences – F-test, Levene’s test, Fligner-Killeen test, etc.
The t-test is very robust to violations of the assumptions of Normality and equal variances, particularly for moderate (i.e., >30) and larger sample sizes.
Resource: www.statstutor.ac.uk
Assessing Normality
Charts can be used to informally assess whether data is normally distributed or skewed.
The mean and median are very different for skewed data.
Illustration of Data Distribution
Histograms showing frequency distributions.
Examples include distributions of weight loss for placebo and treatment groups, and a new drug.
Question: Do these histograms look approximately normally distributed?
What if the Assumptions are Not Met?
There are alternative tests which do not have these assumptions.
Independent t-test: Use Mann-Whitney test if histograms of data by group are not normal.
Paired t-test: Use Wilcoxon signed-rank test if the histogram of paired differences is not normal.
Comparing Means
Comparing means between groups; comparing measurements within the same subject.
2 Independent t-test; One-way ANOVA 3+.
2 Paired t-test; Repeated measures ANOVA 3+.
ANOVA = Analysis of variance.
www.statstutor.ac.uk
Summary
To test against the null hypothesis, we first calculate a statistic.
A statistic is a numerical summary that reduces our data under the null hypothesis to one value.
In frequentist statistics, we use statistics whose possible values under the null hypothesis are known.
The researcher is the one who establishes the level of confidence for the test against H₀, by deciding a value of alpha (either 0.05 or 0.01).
The decision on failing to accept H₀ is usually done by comparing the p-value against the set threshold alpha (α).