Hypothesis Testing: Significance, Proportions, and Means (Comprehensive Notes)

Significance level, Type I/II errors, and power

Significance level (alpha, $\alpha$ ) is the probability of a Type I error (false positive) you are willing to tolerate.
- If you tolerate a false positive up to 5%, you set $\alpha = 0.05$ .
- You might choose smaller values (e.g., $\alpha = 0.01$ ) if the consequences of a false positive are severe; or larger values if you’re willing to tolerate more false positives.
Type I error: rejecting a true null hypothesis.
Type II error: failing to reject a false null hypothesis.
Power of a test: probability of correctly rejecting a false null hypothesis, i.e., $\text{Power} = 1 - \beta$ where $\beta$ is the Type II error rate.
In some contexts you’ll hear statements like: "the power is 90%". This means you are willing to tolerate a Type II error of $\beta = 0.10$ .
Practical takeaway: the choice of $\alpha$ reflects the relative severity of making Type I vs Type II errors and the consequences in real-world decisions.

p-value interpretation

The p-value is a conditional probability: the probability of observing data as extreme as, or more extreme than, what was observed given that the null hypothesis is true.
Intuitively: a small p-value means the observed data are unlikely under the null hypothesis, providing evidence against the null.
Common qualitative interpretations (not exact thresholds):
- Very small p-value → strong evidence against H0
- Moderate evidence around 0.05
- Large p-value → weak evidence against H0
Decision rule: compare the p-value to the significance level $\alpha$ .
- If p-value $\leq\alpha$ , reject the null hypothesis.
- If p-value >\alpha, do not reject the null hypothesis.

Null and alternative hypotheses: essential properties

Hypotheses should be mutually exclusive (disjoint) and collectively exhaustive (cover all possibilities).
Null hypothesis (H0) often includes equality: $H0: \theta = \theta0$ (or $p = p0$ or $\mu = \mu0$ ).
Alternative hypothesis (H1) is what you are trying to provide evidence for, and can take three forms:
- Left-tailed: $H1: \theta < \theta0$ (or p < p0 or $\mu < \mu0$ ).
- Right-tailed: $H1: \theta > \theta0$ (or p > p0 or $\mu > \mu0$ ).
- Two-tailed: $H1: \theta \neq \theta0$ (or $p \neq p0$ or $\mu \neq \mu0$ ).
Why this matters: the tail of the distribution you use for critical values depends on the alternative hypothesis.
In many classroom examples, the alternative is stated first (what you want to prove), and the null is set to the corresponding boundary value (e.g., equality or an inclusive bound).

Hypothesis testing for proportions (one-sample)

When testing a population proportion, the observed sample proportion $\hat{p}$ is compared to a hypothesized value $p_0$ .
Three setups correspond to the alternative:
- Left-tail: $H1: p < p0$
- Right-tail: $H1: p > p0$
- Two-tail: $H1: p \neq p0$
The standard test statistic for large samples uses the normal approximation:
- If we use the null value for the standard error, the statistic is
 $\displaystyle z = \frac{\hat{p} - p0}{\sqrt{\dfrac{p0(1-p_0)}{n}}}.$
Decision rule via p-value: compute the one-sided or two-sided p-value from the standard normal distribution and compare to $\alpha$ .
Practical note: the transcript discusses a dataset example with counts (e.g., a total sample size of 400 and counts for categories such as unhappy/satisfied). When data are provided as counts, you can interpret them as a sample proportion and proceed with the z-test above using the hypothesized value $p_0$ .
Example setup (as described):
- Sample size: $n = 400$ ; observed count in a category (e.g., unhappy) is $x = 23$ ; thus $\hat{p} = \dfrac{x}{n} = \dfrac{23}{400} = 0.0575$ .
- Hypothesized value: for instance $p0 = 0.75$ and alternative H1: p < 0.75.
- Compute $z = (0.0575 - 0.75) / \sqrt{0.75(1-0.75)/400}$ and obtain the left-tail p-value.
- Interpret results at different $\alpha$ levels (e.g., 0.10, 0.05, 0.01).
Notes about software workflow (as described): Probability calculations and p-values can be obtained via a template or a statistics tool (e.g., a spreadsheet/add-in) which outputs sample size, sample proportion, hypothesized proportion, standard error, z, and p-value for left/right/two-tailed tests.

Hypothesis testing for means

When testing a population mean, hypotheses concern a population parameter $\mu$ .
Two broad cases depending on whether the population standard deviation $\sigma$ is known:
- If $\sigma$ is known, you can use a z-test for the mean.
- If $\sigma$ is unknown (the usual case), you use a t-test with sample standard deviation $s$ and degrees of freedom $df = n-1$ .
Formulas for the test statistics:
- Known sigma (z-test):
  $\displaystyle z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}.$
- Unknown sigma (t-test):
  $\displaystyle t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}, \quad df = n-1.$
The distribution the statistic follows under the null is:
- $z \sim \mathcal{N}(0,1)$ for known $\sigma$ .
- $t \sim t_{df}$ with $df = n-1$ for unknown $\sigma$ .
The p-value is obtained from the appropriate tail(s) of the distribution, depending on the alternative:
- Left-tail: p-value = $P(T \le t{obs})$ or $P(Z \le z{obs})$
- Right-tail: p-value = $P(T \ge t{obs})$ or $P(Z \ge z{obs})$
- Two-tail: p-value = $P(|T| \ge |t{obs}|)$ or $P(|Z| \ge |z{obs}|)$
Example: pizza rating problem (illustrative one-sample mean test)
- Context: rating scale from -10 to 10; sample size $n = 40$ ; sample mean observed; population mean under H0 is a specified value (often 0 or another benchmark). Alternative is $H1: \mu > \mu0$ (greater than the benchmark).
- Since the population standard deviation is not known, use a t-test with $df = n - 1 = 39$ .
- Steps:
- State hypotheses with alternative first: $H1: \mu > \mu0$ ; thus $H0: \mu \le \mu0$ .
- Compute the sample mean $\bar{x}$ and the sample standard deviation $s$ from the data.
- Compute the test statistic $t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}}$ .
- Determine the appropriate tail (right-tailed for this alternative) and obtain the p-value from the $t$ distribution with $df = 39$ .
- Compare the p-value to $\alpha$ and decide whether to reject $H_0$ . If p-value < $\alpha$ , conclude the mean rating is statistically greater than the benchmark.
How the decision is made in practice (as per the described workflow):
- Use the HT (hypothesis testing) template or spreadsheet to compute $\bar{x}$ , $s$ , $t$ , and the associated p-value.
- Choose one-tailed or two-tailed test according to the alternative; adjust calculations accordingly (one-tailed uses the appropriate tail; two-tailed doubles the tail probability).
- If the p-value is smaller than the chosen $\alpha$ , reject the null and support the alternative.
Important conceptual notes:
- The null hypothesis always contains the equality, i.e., $H0: \mu = \mu0$ (or $p = p_0$ ).
- The alternative places the inequality or not-equal relation: $H1: \mu > \mu0$ , $H1: \mu < \mu0$ , or $H1: \mu \neq \mu0$ accordingly.
- The sampling distribution of the test statistic (z or t) is what enables computing p-values and tail probabilities.

Confidence intervals and their relation to hypothesis tests

Confidence intervals depend on whether the population standard deviation is known:
- If $\sigma$ is known: use a z-based interval
  $\displaystyle \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}.$
- If $\sigma$ is unknown: use a t-based interval
  $\displaystyle \bar{x} \pm t_{\alpha/2,\, n-1} \frac{s}{\sqrt{n}}.$
Relationship: a two-sided hypothesis test at level $\alpha$ corresponds to a 1− $\alpha$ confidence interval for the parameter; if the true parameter lies outside the interval, you would reject $H_0$ at level $\alpha$ .

Practical study and course workflow (as described in the transcript)

The instructor emphasizes practicing with real data problems (e.g., class pizza ratings, customer satisfaction counts) rather than only templates.
For proportions and means, practice with both summary data (counts, totals) and raw data when available.
In-class workflow uses templates or the HT Me spreadsheet to perform the calculations, including choosing the tail type, entering hypothesized values, and interpreting the p-value in relation to $\alpha$ .
The instructor points to additional resources (e.g., Canvas modules and class presentations) for further practice problems (Chapter 9) and encourages solving practice questions to reinforce understanding.

Quick reference: key formulas and concepts (LaTeX)

Significance level and errors
- $\alpha = P(\text{Type I error})$
- $\beta = P(\text{Type II error})$
- $\text{Power} = 1 - \beta$
p-value interpretation
- For a given test statistic, the p-value is the tail probability under H0:
- One-sided: $\text{p-value} = P(T \text{ in the tail beyond } t_{\text{obs}})$
- Two-sided: $\text{p-value} = P(|T| \ge |t_{\text{obs}}|)$
Proportion test (large-sample z-test)
- $z = \dfrac{\hat{p} - p0}{\sqrt{\dfrac{p0(1-p_0)}{n}}}$
- Decision: reject H0 if p-value < $\alpha$ (one-sided) or p-value < $\alpha$ /2 (two-sided), depending on the test direction.
Means testing (unknown sigma, t-test)
- $t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}}$ , with $df = n-1$
- One-sided or two-sided p-values derived from the $t_{df}$ distribution.
Confidence intervals
- Known sigma (z): $\bar{x} \pm z_{\alpha/2} \dfrac{\sigma}{\sqrt{n}}$
- Unknown sigma (t): $\bar{x} \pm t_{\alpha/2,\, n-1} \dfrac{s}{\sqrt{n}}$
Null and alternative structure
- H0 typically includes equality: $H0: \theta = \theta0$
- H1 includes the inequality or not-equal: $H1: \theta \neq \theta0$ or $H1: \theta > \theta0$ / $H1: \theta < \theta0$

Connections to real-world decisions and ethics

Choice of $\alpha$ reflects risk tolerance for false positives; higher stakes decisions deserve smaller $\alpha$ .
Misinterpretation of p-values can lead to overclaiming effects that are not practically meaningful; context and study design matter.
Reporting should distinguish statistical significance from practical significance and consider study power and sample size implications.

Study plan recommendations (from lecture guidance)

Practice with the Chapter 9 problems and the pizza rating example to reinforce testing for means with unknown $\sigma$ .
Use provided templates or HT Me spreadsheet to check that you select the correct tail, compute the test statistic, and read the p-value correctly.
Review the distinctions between one-sample tests for proportions and means, and ensure you can translate data (counts or raw data) into the appropriate test statistic and p-value.
Revisit the concept that hypotheses are about population parameters (proportions and means), not sample statistics.
Explore class presentations and voice-over slides in Canvas for additional worked examples and practice problems.

Summary takeaways

Always state H0 and H1, identify the tail type, compute the test statistic, obtain the p-value, and compare to $\alpha$ to decide whether to reject H0.
For proportions, use the normal approximation with the standard error based on the null value $p_0$ ; for means, decide between z or t based on whether $\sigma$ is known.
Understand the meaning of Type I/II errors and power to set appropriate significance levels and interpret results in context.