Lecture 21 - Hypothesis Testing for Population Proportions and Inferential Statistics

The course is moving from theoretical concepts discussed in the past two weeks (Normal distributions and Central Limit Theory) toward practical statistical inference.
Last Week Recap: Theoretical distributions answer the question of what happens if one samples to infinity. The Central Limit Theorem (CLT) establishes that if we are discussing means and proportions, the distribution of the statistic is generally normal, assuming a specific mean and standard deviation.
The central goal of the rest of the course is using a statistic (derived from a sample) to make statements about a parameter (derived from a whole population). This process is known as Inferential Statistics.

Population Parameters: These are fixed but unknown characteristics of a population.
- Examples: The average wage of New Zealanders, the spread of incomes in New Zealand, or the unimodal/symmetric shape of these incomes.
- Observation: A population parameter can only be truly known by performing a census, which is often impossible due to resource constraints.
Sample Statistics: Calculated characteristics from a subset (sample) of the population. This process is called Descriptive Statistics.
- Examples: Sample means ( $\bar{x}$ ), sample proportions ( $\hat{p}$ ), sample standard deviation ( $s$ ), sample median, and sample interquartile range (IQR).
Parameters of Interest: For the next three weeks, the focus will be on two specific parameters:
- Population Mean ( $\mu$ ): Denoted by the Greek letter mu (described as a "squiggly m").
- Population Proportion ( $\pi$ ): Denoted by the Greek letter pi (not the mathematical constant $3.14$ , but a representation of population proportion).
Notation Note: It is critical to distinguish between characters: $\bar{x}$ and $\hat{p}$ are statistics from samples, while $\mu$ and $\pi$ are parameters from populations.

Distribution of Means: Following the CLT, means are normally distributed with a mean of $\mu$ and a standard deviation of $\frac{\sigma}{\sqrt{n}}$ . This holds if the sample size n > 30 or if the population is normally distributed.
Distribution of Proportions: A proportion is a special case of a mean. It is normally distributed with a mean of $\pi$ and a standard deviation (standard error) of $\sqrt{\frac{\pi \times (1-\pi)}{n}}$ .
Conditions for Proportions: This is true if the sample size is large enough to fulfill the condition: \pi \times (1 - \pi) \times n > 10.
Application: Because these distributions are normal, we can calculate the probability of observing a particular sample statistic, such as the probability that a six-pack of beer contains an average of $330\,dm^3$ based on the individual bottle distribution.

There are two primary procedures used to estimate or make statements about parameters:

Hypothesis Tests: Used ubiquitously in data analytics, climate change research, biology, and social sciences.
Confidence Intervals: Highly widely used and historically important; the lecturer argues they should arguably be used more often than hypothesis tests.

Define and Hypothesize: Define the parameter of interest (mean or proportion) and hypothesize a specific value for it.
Collect and Summarize Data: Use probability-based sampling methods (Simple Random, Stratified, or Cluster) to obtain a representative sample. Summarize the data using a sample statistic.
Perform Inference (Test Statistic): Compute a test statistic to determine if the difference between the sample statistic and hypothesized value is due to sampling error (chance) or an incorrect hypothesis.
Make a Decision (P-Value): Use the test statistic to find a p-value and determine the plausibility of the null hypothesis.
Contextualize: Translate the statistical result back into the real-world problem for decision-making.

Hypothesize: We believe a die is fair, meaning the "true" or "long-run" proportion of rolling a specific face is the same for all faces.
- Parameter: Population proportion ( $\pi$ ).
- Hypothesized Value: $\frac{1}{6}$ (or $0.1667$ ).
Collect Data: Roll the die $n = 100$ times. Each roll is independent.
- Observed Result: A three is rolled $13$ times.
- Sample Statistic ( $\hat{p}$ ): $\frac{13}{100} = 0.13$ .
Calculate Test Statistic:
- Formula: $z = \frac{\text{Sample Statistic} - \text{Hypothesized Value}}{\text{Standard Error}}$ .
- Standard Error: $\sqrt{\frac{\pi_{0} \times (1 - \pi_{0})}{n}}$ .
- Calculation: $z = \frac{0.13 - 0.1667}{\sqrt{\frac{0.1667 \times (1 - 0.1667)}{100}}} = -0.98469$ .
- Interpretation: The sample proportion is approximately $0.985$ standard deviations below the hypothesized mean.
Find the P-Value:
- Because the test is for fairness (could be more or less than $\frac{1}{6}$ ), it is a two-sided test.
- Using normal tables, P(z < -0.98) = 0.1635.
- For the two-sided p-value, multiply by two: $0.1635 \times 2 = 0.327$ (approx. $33\%$ ).
- Computer-calculated P-value: $0.324$ (more precise due to decimal places).
Conclusion: We observe the result $13/100$ approximately $32\%$ of the time if the die is fair. This is not unusual.

Null Hypothesis ( $H_{0}$ ): The statement of current belief. It claims the parameter is equal to the hypothesized value ( $\pi = 0.1667$ ). We assume this is true throughout the test; it is "on the hot seat."
Alternative Hypothesis ( $H_{1}$ ): The statement contrary to the null. It claims a statistically significant difference exists ( $\pi \neq 0.1667$ ).
Note on Fairness: In the die example, the alternative is $\neq$ because either a higher or lower proportion would indicate the die is unfair. In other contexts, one might use greater than (>) or less than (<.

P-Value Definition: The probability of observing our sample statistic (or something more extreme) assuming the null hypothesis is true.
Common Misconception: People often mistakenly interpret the p-value as the probability that the null hypothesis is true.
Significance Level ( $\alpha$ ): The scientific threshold or baseline used to reject the null hypothesis. It is typically set at $5\%$ .
Decision Rules:
- If P-value < 5\% \rightarrow Reject the Null Hypothesis (Statistically Significant result). Data is inconsistent with the null.
- If P-value > 5\% \rightarrow Fail to Reject the Null Hypothesis (Not a statistically significant result). Evidence is insufficient to suggest the null is false.

Question: Is the notation for population proportion "P-hat" ( $\hat{p}$ )?
Response: No, that is the sample proportion. The population proportion is denoted as the Greek letter pi ( $\pi$ ). Avoid mixing these up; the parameters ( $\mu, \pi$ ) are what we want to say something about, while statistics ( $\bar{x}, \hat{p}$ ) are functions of the specific sample data.