Hypothesis Testing: One-Sided vs. Two-Sided Tests and One-Proportion Tests

Understanding Hypothesis Testing and P-values (Recap)

P-value Calculation for Two-Tailed Tests: When the alternative hypothesis ( $HA$ ) states that two proportions are different, or two variables are not independent ( $P1 eq P_2$ or 'not independent'), the p-value is calculated as the fraction of simulations that fall further away from zero than the observed result. This includes both tails of the distribution: simulations observed after the actual observed difference and simulations before the negative of the observed difference.
- Example: A previous d-value calculation resulted in $28/5000 = 0.0056$ .
Level of Evidence: This is determined by comparing the calculated p-value to a specific table (which must be committed to memory for the exam, as it will not be provided).
- Different ranges of p-values correspond to different levels of evidence against the null hypothesis ( $H_0$ ).
- In a prior example, a very small p-value indicated "very strong evidence against the null model," suggesting the observed sample distribution could not match an independent distribution.

One-Sided Hypothesis Testing

Shift from General Differences: We move beyond simply testing if two proportions are "the same or different" or if variables are "independent or dependent." Instead, we focus on specific relationships, such as whether the first proportion is larger than or smaller than the second proportion.

Example: Survey Incentives

Research Question: Does offering a gift card increase the response rate for a survey? This focuses on a positive effect.
Problem: Surveys often suffer from non-response bias, reducing the amount of data available for analysis. Incentives like gift cards are used to mitigate this.
Experiment Design: Amber conducted a study with 40 students.
- Gift Card Group: 15 students received a survey link with the promise of a $\$5$ gift card upon completion.
- No Gift Card Group: The remaining 25 students received the survey link without any gift card promise.
- Goal: Compare response rates to see if the gift card group had a higher response rate.
Observed Results: Out of 40 students, 19 responded.
- Gift Card Group: 9 out of 15 responded.
- No Gift Card Group: 10 out of 25 responded.

Setting up Null and Alternative Hypotheses (One-Sided Example)

Null Hypothesis ( $H0$ ): There is no difference in response rates between the gift card group and the no gift card group. Any observed difference is due to random chance (i.e., the variables are independent, or $P{ ext{gift card}} = P_{ ext{no gift card}}$ ).
Alternative Hypothesis ( $HA$ ): The response rate in the gift card group is greater than the response rate in the no gift card group (i.e., P{ ext{gift card}} > P_{ ext{no gift card}}).
- Key Point: This is a one-sided test because we are specifically interested in an increase, not just any difference.
- Alignment: It is crucial that the alternative hypothesis directly reflects the research question (e.g., if the question is about a positive effect, $H_A$ should be 'greater than').

Calculating the Observed Difference

Response Rate (Gift Card): $\hat{p}_{ ext{gift card}} = 9/15 = 0.6$
Response Rate (No Gift Card): $\hat{p}_{ ext{no gift card}} = 10/25 = 0.4$
Observed Difference: $0.6 - 0.4 = 0.2$
- The next step is to determine if this $0.2$ difference is typical under the null hypothesis or a rare occurrence, providing evidence against the null.

Simulating the Null Distribution (Independence Model)

Assumption: If the null hypothesis (independence) is true, then the gift card has no effect, and both groups should have similar response rates, reflecting the overall response rate.
Overall Response Rate: $19/40 = 0.475$
Expected Responses (if independent):
- Gift Card Group (15 students): $15 imes 0.475 = 7.125$ expected responders.
- No Gift Card Group (25 students): $25 imes 0.475 = 11.875$ expected responders.
Simulation Process (Creating a Null Distribution):
1. Prepare Cards: Create 40 cards total.
  - 15 cards labeled "Gift" (representing students in the gift card group).
  - 25 cards labeled "No Gift" (representing students in the no gift card group).
2. Shuffle and Draw: Shuffle the 40 cards thoroughly. Randomly draw 19 cards (representing the total number of responders in the original sample).
3. Assign Roles: The 19 drawn cards represent individuals who responded. The remaining 21 cards represent non-responders.
4. Calculate Simulated Difference: From the 19 cards drawn (responders), count how many were "Gift" and how many were "No Gift". Calculate the simulated difference in proportions: $(\text{count of 'Gift' responders}/15) - (\text{count of 'No Gift' responders}/25)$ .
5. Repeat: Repeat this process thousands of times (e.g., 5,000 simulations). Each repetition generates one simulated difference in response rates, contributing to a "dot plot" that forms the null distribution.
  - The center of this distribution will be 0, consistent with the null hypothesis that there is no difference in response rates.
  - The distribution shows the range of differences expected if the null hypothesis were true.

Calculating the P-value for a One-Sided Test

Alignment of $H_A$ and P-value: The method of calculating the p-value must align with the direction specified in the alternative hypothesis.
- If HA: P1 > P_2 (first proportion is higher), the p-value is the fraction of simulations where the simulated difference was greater than or equal to the observed difference (upper tail).
- If HA: P1 < P_2 (first proportion is less than), the p-value is the fraction of simulations where the simulated difference was less than or equal to the observed difference (lower tail).
- If $HA: P1 \neq P_2$ (proportions are different), the p-value is the fraction of simulations where the absolute difference was as extreme or more extreme than the observed absolute difference (both tails, as seen in previous lectures).
Survey Incentive Example P-value: Since $HA$ was P{ ext{gift card}} > P_{ ext{no gift card}}, we use the upper tail.
- Observed difference: $0.2$ . We look for simulations with differences $\ge 0.2$ .
- If, for example, 910 out of 5,000 simulations had a difference $\ge 0.2$ .
- P-value: $910/5000 = 0.182$ .

Interpreting the P-value and Evidence

P-value of $0.182$ : This is a relatively large p-value (greater than $0.1$ ).
Level of Evidence (from memorized table): A p-value greater than $0.1$ indicates little evidence against the null model.
Conclusion for Survey Incentives: Based on our observed results and the simulations, there is little evidence to claim that the gift card incentive has a positive effect on response rates.
- The observed difference of $0.2$ is not rare enough to reject the null hypothesis; it is plausible that such a difference could occur by chance even if the gift card had no effect (i.e., the response rates are truly the same).
- This means the null model ( $P{ ext{gift card}} = P{ ext{no gift card}}$ ) is compatible with the observed data.
- We are inclined to believe that there would not be an increase in the response rate, and the variables are probably independent.

Summary of P-value Calculation Types

Two-Tailed Test (Test of Independence): Used when $HA: P1 \neq P_2$ or variables are 'not independent'. P-value is the fraction of simulations as extreme or more extreme than the observed difference (covers both tails: values after the observed difference and values before the negative of the observed difference).
One-Sided Upper-Tail Test (Test for Increase): Used when HA: P1 > P_2. P-value is the fraction of simulations with a simulated difference greater than or equal to the observed difference.
One-Sided Lower-Tail Test (Test for Decrease): Used when HA: P1 < P_2. P-value is the fraction of simulations with a simulated difference less than or equal to the observed difference.

Homework Assignment

Exercise: Complete the "Weather and Weekend" example (pages 77-78) in Top Hat.
Steps: Practice writing null/alternative hypotheses, finding the observed difference, and calculating the p-value.
Availability: Opens today at 12:20 PM and closes Wednesday before 11:30 AM.

One-Proportion Hypothesis Test

New Type of Test: This test focuses on a single population proportion, testing if it equals a specific hypothesized value.
Notation:
- $\hat{p}$ : Sample proportion (obtained from data).
- $p$ : True population proportion.
- $p_0$ : Null value (the specific hypothesized value for the population proportion).
- P-value: The p-value as previously defined.
Hypotheses Structure:
- Null Hypothesis ( $H0$ ): The population proportion is equal to the hypothesized null value ( $p = p0$ ).
- Alternative Hypothesis ( $HA$ ): The population proportion is different from, greater than, or smaller than the null value ( $p \neq p0$ , p > p0, or p < p0).

Example: Facial Prototyping (Bob vs. Tim)

Concept: Facial prototyping is the tendency to associate certain facial characteristics with specific names.
Experiment: Participants view two photos and guess which person is named "Tim" (Image 1 or Image 2).
Research Question: Do facial characteristics influence how people associate names with faces?
Observed Results (In-Class): 72% of the 292 student respondents believed the person in Image 1 was "Tim."

Setting up Hypotheses (One-Proportion Example)

Parameter: The population proportion ( $p$ ) of people who think Image 1 is Tim.
Null Value ( $p0$ ): If facial characteristics have no influence, then selecting Image 1 or Image 2 would be purely by chance, meaning 50% would choose Image 1. So, $p0 = 0.5$ .
Null Hypothesis ( $H_0$ ): The actual percentage of people who think Image 1 is Tim is 50% ( $p = 0.5$ ). (No facial prototyping effect).
Alternative Hypothesis ( $H_A$ ): The actual percentage of people who think Image 1 is Tim is different from 50% ( $p \neq 0.5$ ). (Facial characteristics influence).
- P-value Calculation: Since $H_A$ states "different from," this will be a two-tailed test for p-value calculation.

Simulating the Null Distribution (One-Proportion Test)

Goal: Simulate possible sample proportions if the null hypothesis ( $p = 0.5$ ) were true.
Simulation Process:
1. Prepare Cards: Use 100 cards. Label 50 cards as "Image 1 (Tim)" and 50 cards as "Image 2 (Tim)". (This represents 50% chance for either choice).
2. Draw with Replacement: Since the true population proportion is assumed to be fixed at $0.5$ under the null, each draw must have the same $0.5$ chance. Therefore, draw cards with replacement. After each draw, record the result and return the card to the deck.
3. Simulate Sample: Draw as many times as the actual sample size ( $N = 292$ in the class example).
4. Calculate Simulated Proportion: Count how many times "Image 1 (Tim)" was drawn out of 292. This gives one simulated sample proportion ( $\hat{p}$ ).
5. Repeat: Repeat this process thousands of times to build a null distribution of sample proportions ( $\hat{p}$ ).
  - The center of this null distribution will be $0.5$ , reflecting the null hypothesis.
  - The observed sample proportion ( $\hat{p} = 0.72$ ) will then be placed on this distribution.

Calculating P-value for Facial Prototyping Example

Two-Tailed Test: Given $H_A: p \neq 0.5$ , we need to consider both tails.
Observed Proportion: $0.72$ .
Symmetric Value: The distance from the null value ( $0.5$ ) to the observed proportion ( $0.72$ ) is $0.72 - 0.5 = 0.22$ . The symmetric value on the other side of $0.5$ is $0.5 - 0.22 = 0.28$ .
P-value: The fraction of simulations that resulted in proportions greater than or equal to $0.72$ OR less than or equal to $0.28$ .