Stat 211: Elementary Inferential Statistics - Unit 6 Study Notes

Example: 2024 Births
- In 2024, 3,622,673 babies were born in the US.
- The proportion of premature births in 2024 was 10.4%.

Definition: The sampling distribution for proportions specifies how sample proportions vary.
One individual proportion is derived from a sample of 100 babies.
- Distribution Characteristics:
- Normally distributed with mean: $p = 0.104$
- Standard deviation determined as:
  $ext{Standard deviation} = rac{p(1-p)}{n}$
- Specific normal distribution:
  $N(0.104, 0.031)$
Sample Numbers:
- # of Samples: 150, 100, 50, 50.

This rule allows for probability estimations from a normally distributed dataset:
- 0.15% | 2.35% | 13.5% | 34% | 34% | 68% of values fall within:
- 1 (68%)
- 2 (95%)
- 3 (99.7%) standard deviations from the mean.
Application: Empirical rules extend to estimating confidence intervals.

Objective: Estimate population parameters from a sample
Single values lack precision; for example, if a sample of 100 babies has 14 preemies, $p = 0.14$ .
Computing standard deviation from just this sample is not ideal.
- Standard Error (SE):
- Definition: Estimated standard deviation of a sampling distribution
- Significance: Estimates variability of sample statistics (p or x).
- Formula for SE of Sample Proportions:
  $SE(p) = rac{p(1-p)}{n}$
- Applying the SE Calculation:
  - For $p = 0.14$:
    $SE(p) = rac{0.14(1-0.14)}{100} <br>ightarrow SE(p) = 0.035$

We utilize sample statistics and account for uncertainty using the standard error.
- Estimated sampling distribution is:
- From sample: $N(0.14, 0.035)$
- True distribution: $N(0.104, 0.031)$
Confidence Interval Interpretation:
- If 95% of similar samples lay between 7% and 21%, then:
- 0.07 < p < 0.21 or (7%, 21%) indicates we are 95% confident the true population proportion is within this interval.

A rough approximation for 95% CI is:
- $ext{CI} <br>ightarrow p ext{ ± } 2( ext{Standard Deviation})$
Critical z-Values: Used to define confidence levels, calculated from standard normal distribution:
- Computation Examples:
- Confidence Level 90% → Critical z-Value ≈ 1.645
- Confidence Level 95% → Critical z-Value ≈ 1.960
- Confidence Level 99% → Critical z-Value ≈ 2.576

Apply more precise definition:
- $ext{CI} <br>ightarrow p ± 1.96( ext{SE})$
- Example for $p = 0.14$ :
- $0.14 ± 1.96(0.035)$
This is known as the one-proportion z-interval.

Commonly misphrased interpretations include:
- "14% of all babies born in 1998 were born prematurely."
- "It is probably true that 14% of all babies born in 1998 were born prematurely."
- Incorrectly stating that the true proportion is within a computed interval is misleading.
What to Say Instead:
- Statements like "We are 95% confident the true population proportion lies within this interval" are more accurate.

Conditions required to find a confidence interval include:
- Previous formulas apply with ED(p) defined as:
- $p ± z* rac{p(1 − p)}{n}$
- Where z* specifies the number of SEs needed for C% of random samples.

Quantifying uncertainty is crucial; confidence intervals provide clarity beyond simple point estimates.
- “Confidence” elucidates the long-run success of the method over multiple samples.
- The value of confidence intervals is in their broader representativeness, not the accuracy of any single interval determination.

From a 2025 survey, 1,000 adults revealed:
- 591 people who preferred pineapple on pizza.
Calculate the 95% confidence interval for the population proportion:
- Using the confidence interval applet, indications for parameters include:
- Number of Successes: 591
- Total Sample Size: 1000
- Confidence Level: 95%
- Output from Calculator:
- Confidence Interval: (0.5605, 0.6215)

Aim: Compute a 90% confidence interval for the proportion of college students who tried boba tea.
- The interval is determined as:
- Point Estimate ± z*•Standard Error.

Margin of Error Explanation:
- Tradeoff between confidence level and interval width:
- More confidence → Wider interval.
- Less confidence → Narrower interval.
Example intervals represent confidence vs. precision:
- 50% CI: (0.09, 0.11) yields smaller margin of error
- 100% CI: [0, 1] indicates maximal confidence with uncertainty.

Polling samples report proportions ± margin of error:
- E.g., Candidate A at 52% with ±3% ME gives CI: [49%, 55%].
Importance of sample size calculations for determining required sample sizes to achieve desired confidence levels and margin of error.

Estimation methods differing based on prior estimates:
- When p is known: $n = rac{z^2 imes p(1-p)}{ME^2}$
- When p is not known (use conservative): $n = rac{z^2}{2 ME^2}$
Always round up in sample size calculations.
Example Calculations:
- For 3% margin of error with 95% confidence, determine appropriate sample size.

Example: 2024 Birthweights
- In 2024, average birthweight was 3,318.9 grams.
Sampling Distribution:
- Central Limit Theorem applies; distribution generated by sample means can be modeled as normal:
- ar{x} ext{ drawn from } N( ext{Population Mean } ext{μ}, ext{ Population SD } σ)
- Mean calculated:
- $N(3318.9, 60.95)$

Formula for computing confidence interval:
- ext{CI} = ar{x} ± z^* rac{σ}{ ext{sqrt}(n)}.

Most commonly encounter unknown population standard deviation (use sample standard deviation:.
t-Distribution Introduction:
- Derived by William S. Gosset under pseudonym