Significance Tests

A significance test is a procedure for using observed data to decide between two competing claims, called hypotheses.
The hypotheses are usually statements about a parameter, like the population proportion $p$ or the population mean $\mu$ .
The claim we weigh evidence against in a statistical test is called the null hypothesis $(H_o)$ . The null hypothesis often has the form: parameter = null value.
The claim about the population that we are trying to find evidence for is the alternative hypothesis $(H_a)$ .
A one-sided alternative hypothesis has the form: parameter < null value or parameter > null value.
A two-sided alternative hypothesis has the form: parameter $\neq$ null value.
A standardized test statistic measures how far a sample statistic is from what we would expect if the null hypothesis $(H_o)$ were true, in standardized units.
- Standardized test statistic = $\frac{statistic - null \ value}{standard \ deviation \ (error) \ of \ statistic}$
The P-value of a test is the probability of getting evidence for the alternative hypothesis as strong or stronger than the observed evidence by chance alone when the null hypothesis $(H_o)$ is true.
Small P-values are evidence against the null hypothesis and for the alternative hypothesis because they say that the observed result is unlikely to occur when $(H_o)$ is true. To determine if a P-value should be considered small, we compare it to the significance level $\alpha$ .

We make a conclusion in a significance test based on the P-value.
If P-value < $\alpha$ : Reject $(Ho)$ and conclude there is convincing evidence for $(Ha)$ (in context).
If P-value > $\alpha$ : Fail to reject $(Ho)$ and conclude there is not convincing evidence for $(Ha)$ (in context).
When we make a conclusion in a significance test, there are two kinds of mistakes we can make.
A Type I error occurs if we find convincing evidence that $(H_a)$ is true when it really isn’t.
A Type II error occurs if we do not find convincing evidence that $(H_a)$ is true when it really is.
The probability of making a Type I error is equal to the significance level $\alpha$ . There is a tradeoff between P(Type I error) and P(Type II error): as one decreases, the other increases, assuming everything else remains the same. So it is important to consider the possible consequences of each type of error before choosing a significance level.

The conditions for performing a significance test of $Ho : p = po$ are:
- Random: The data come from a random sample from the population of interest.
- Large Counts: Both $npo$ and $n(1 - po)$ are at least 10.
The standardized test statistic for a one-sample z test for p is
- $z = \frac{\hat{p} - po}{\sqrt{\frac{po(1 - p_o)}{n}}}$
When the Large Counts condition is met, the standardized test statistic has approximately a standard normal distribution. You can use Table A or technology to find the P-value.
Confidence intervals provide additional information that significance tests do not—namely, a set of plausible values for the population proportion p based on sample data. A 95% confidence interval for p gives information about the parameter that is generally consistent with a two-sided test of $Ho : p = po$ at the $\alpha = 0.05$ significance level.

The conditions for performing a significance test of $Ho : \mu = \muo$ are:
- Random: The data come from a random sample from the population of interest.
- Normal/Large Sample: The data come from an approximately normally distributed population or the sample size is large $(n \ge 30)$ . When the sample size is small and the shape of the population distribution is unknown, a graph of the sample data shows no strong skewness or outliers.
The standardized test statistic for a one-sample t test for $\mu$ is
- $t = \frac{\bar{x} - \mu_o}{S / \sqrt{n}}$
When the Normal/Large Sample condition is met, the standardized test statistic can be modeled by a t distribution with n - 1 degrees of freedom (df). You can use Table B or technology to find the P-value.
Confidence intervals provide additional information that significance tests do not—namely, a set of plausible values for the population mean $\mu$ based on sample data. A 95% confidence interval for $\mu$ gives information about the parameter that is consistent with a two-sided test of $Ho : \mu = \muo$ at the significance level $\alpha = 0.05$ .

Very small deviations from the null hypothesis can be highly significant (small P-value) when a test is based on a large sample. A statistically significant result may not be practically important.
Many tests that are run at once will likely produce some significant results by chance alone, even if all the null hypotheses are true. Beware of P-hacking.