Chapter 9: Hypothesis Testing

This chapter extends the concepts used in Chapters 7 and 8, focusing on making statistical inferences through hypothesis testing.
Important to apply theoretical concepts to concrete examples, requiring the dataset 'anes20.rda' for analysis.

Hypothesis Testing Overview:
- Asks: "What is the probability that the statistic found in the sample came from a population with a specified value?"
- Social scientists rely on sample data to infer about population values.
- Recognizes the existence of sampling error.
Key Concepts:
- We use confidence intervals to account for sampling error (as discussed in Chapter 8).
- Goal: Determine if sample statistic diverges enough from hypothesized population parameter to rule out sampling error.

Null Hypothesis (H₀):
- Hypothesis tested directly.
- States that the sample finding (x̄) equals some hypothetical population parameter (μ).
- Typically aims to reject H₀.
Alternative Hypothesis (H₁):
- Substantive hypothesis believed to be true.
- Usually states that the sample statistic does not equal specified population parameter.
- Not directly tested; evidence is gathered against H₀ to support H₁.

Calculate the 95% confidence interval for sample mean (54.8) to check if it includes population mean (59.2):
- c.i.₉₅ = 54.8 ± 1.96(𝑆_{x̄})
- Using standard deviation of the sample (15.38) for a sample size of 100:
- S_{x̄} = rac{15.38}{ ext{√100}} = 1.538
- ext{c.i.₉₅} = 54.8 ext{ ± } 1.96(1.538)
Case Example:
- Analyst studies impact of a new sick leave documentation method:
- Previous average sick leave: 59.2 hours (approx. 7.4 days).
- After new policy, sample mean: 54.8 hours (approx. 6.8 days).
- Question: Is this change significant or just sampling error?
Outcome:
- 95% confidence interval estimates that μ is between 51.78 and 57.81.
- Probability that sick leave hours remained the same is < 0.05, indicating a probable reduction in sick leave usage.

Setting up the Hypothesis:
- Null Hypothesis: H₀: μ = 59.2, meaning no difference from previous year's mean.
- Alternative Hypothesis: H₁: μ < 59.2; this suggests sample statistic signifies a genuine change rather than random error.
Testing Probability:
- Find the likelihood of obtaining a sample mean of 54.8 if H₀ is true.
- Using distribution logic to estimate sampling distributions.
- If low probability of obtaining 54.8 suggests rejection of H₀.

One-tailed Test:
- Testing if sick days decreased due to policy change.
Critical Values:
- z-score for rejection typically < 0.05; common critical value for z: -1.645.
Example Calculation:
- Obtained z-score: -2.86, |z| > |c.v.| (reject H₀) indicates an actual decline in sick leave hours.
- Calculated p-value: 0.002118, indicating less than a 0.05 threshold for significance.

Used for hypothesis testing when population standard error is unknown.
Calculating T-Scores:
- Formula is the same as z-scores; however, critical values differ (affected by sample size).
- Degrees of freedom (df) in hypothesis testing about a single mean: df = n - 1.
- As sample size increases, t-distribution increasingly resembles z-distribution.
Finding Critical Values:
- Can look them up in tables or compute using R's qt function for more precise df and alpha area decisions.
- Example for one-tailed: df=99, p=0.05 yields t=-1.662.

Logic of hypothesis testing applies to proportions just as it does to means.
Example testing employee sick leave behavior:
- Previous proportion of employees taking at least 7 sick days was 50% (H₀: P = 0.50).
- Current sample shows 41% (H₁: P < 0.50).
Calculation involves determining critical value, z-score for the proportion, comparing to critical value for rejection.

Example case of Biden’s feeling thermometer ratings from ANES data.
- Establish null and alternative hypotheses around the mean expected rating (H₀: μ = 50, H₁: μ ≠ 50).
T-Test Command in R:
- Mean rating returned was 53.41, showing statistical significance with t = 8.2, p-value ≈ 0. Inference: reject H₀.

Foundation created for connecting statistical inference tools to relationships examined across dependent and independent variables.
- Upcoming chapters focus on multivariate relationships, significance testing, and effect sizes.

Case: Population average cost of supplies = $340. State null and alternative hypotheses, analyze findings.

Analyze feeling thermometers for Donald Trump, liberals, conservatives using t-test and descriptive statistics.
Summarize findings and check for contradictions in public opinion.