Stat 211: Elementary Inferential Statistics - Unit 6 Study Notes

Stat 211: Elementary Inferential Statistics

Unit 6: One-Sample Inference

Confidence Intervals for Proportions
  • Example: 2024 Births

    • In 2024, 3,622,673 babies were born in the US.

    • The proportion of premature births in 2024 was 10.4%.

Sampling Distribution
  • Definition: The sampling distribution for proportions specifies how sample proportions vary.

  • One individual proportion is derived from a sample of 100 babies.

    • Distribution Characteristics:

    • Normally distributed with mean: p=0.104p = 0.104

    • Standard deviation determined as:
      extStandarddeviation=racp(1p)next{Standard deviation} = rac{p(1-p)}{n}

    • Specific normal distribution:
      N(0.104,0.031)N(0.104, 0.031)

  • Sample Numbers:

    • # of Samples: 150, 100, 50, 50.

Recall: Empirical Rule
  • This rule allows for probability estimations from a normally distributed dataset:

    • 0.15% | 2.35% | 13.5% | 34% | 34% | 68% of values fall within:

    • 1 (68%)

    • 2 (95%)

    • 3 (99.7%) standard deviations from the mean.

  • Application: Empirical rules extend to estimating confidence intervals.

Our Goal
  • Objective: Estimate population parameters from a sample

  • Single values lack precision; for example, if a sample of 100 babies has 14 preemies, p=0.14p = 0.14.

  • Computing standard deviation from just this sample is not ideal.

    • Standard Error (SE):

    • Definition: Estimated standard deviation of a sampling distribution

    • Significance: Estimates variability of sample statistics (p or x).

    • Formula for SE of Sample Proportions:
      SE(p)=racp(1p)nSE(p) = rac{p(1-p)}{n}

    • Applying the SE Calculation:

      • For $p = 0.14$:
        SE(p)=rac0.14(10.14)100<br>ightarrowSE(p)=0.035SE(p) = rac{0.14(1-0.14)}{100} <br>ightarrow SE(p) = 0.035

Rationale for Confidence Intervals
  • We utilize sample statistics and account for uncertainty using the standard error.

    • Estimated sampling distribution is:

    • From sample: N(0.14,0.035)N(0.14, 0.035)

    • True distribution: N(0.104,0.031)N(0.104, 0.031)

  • Confidence Interval Interpretation:

    • If 95% of similar samples lay between 7% and 21%, then:

    • 0.07 < p < 0.21 or (7%, 21%) indicates we are 95% confident the true population proportion is within this interval.

Rough Confidence Interval
  • A rough approximation for 95% CI is:

    • extCI<br>ightarrowpext±2(extStandardDeviation)ext{CI} <br>ightarrow p ext{ ± } 2( ext{Standard Deviation})

  • Critical z-Values: Used to define confidence levels, calculated from standard normal distribution:

    • Computation Examples:

    • Confidence Level 90% → Critical z-Value ≈ 1.645

    • Confidence Level 95% → Critical z-Value ≈ 1.960

    • Confidence Level 99% → Critical z-Value ≈ 2.576

More Precise Confidence Interval
  • Apply more precise definition:

    • extCI<br>ightarrowp±1.96(extSE)ext{CI} <br>ightarrow p ± 1.96( ext{SE})

    • Example for p=0.14p = 0.14:

    • 0.14±1.96(0.035)0.14 ± 1.96(0.035)

  • This is known as the one-proportion z-interval.

Misinterpretations in Confidence Intervals
  • Commonly misphrased interpretations include:

    • "14% of all babies born in 1998 were born prematurely."

    • "It is probably true that 14% of all babies born in 1998 were born prematurely."

    • Incorrectly stating that the true proportion is within a computed interval is misleading.

  • What to Say Instead:

    • Statements like "We are 95% confident the true population proportion lies within this interval" are more accurate.

One-Proportion z-interval
  • Conditions required to find a confidence interval include:

    • Previous formulas apply with ED(p) defined as:

    • p±zracp(1p)np ± z* rac{p(1 − p)}{n}

    • Where z* specifies the number of SEs needed for C% of random samples.

Why Confidence?
  • Quantifying uncertainty is crucial; confidence intervals provide clarity beyond simple point estimates.

    • “Confidence” elucidates the long-run success of the method over multiple samples.

    • The value of confidence intervals is in their broader representativeness, not the accuracy of any single interval determination.

Example: Pineapple on Pizza Survey
  • From a 2025 survey, 1,000 adults revealed:

    • 591 people who preferred pineapple on pizza.

  • Calculate the 95% confidence interval for the population proportion:

    • Using the confidence interval applet, indications for parameters include:

    • Number of Successes: 591

    • Total Sample Size: 1000

    • Confidence Level: 95%

    • Output from Calculator:

    • Confidence Interval: (0.5605, 0.6215)

Example: Boba Tea
  • Aim: Compute a 90% confidence interval for the proportion of college students who tried boba tea.

    • The interval is determined as:

    • Point Estimate ± z*•Standard Error.

Confidence vs. Precision
  • Margin of Error Explanation:

    • Tradeoff between confidence level and interval width:

    • More confidence → Wider interval.

    • Less confidence → Narrower interval.

  • Example intervals represent confidence vs. precision:

    • 50% CI: (0.09, 0.11) yields smaller margin of error

    • 100% CI: [0, 1] indicates maximal confidence with uncertainty.

Election Polling and Confidence Intervals
  • Polling samples report proportions ± margin of error:

    • E.g., Candidate A at 52% with ±3% ME gives CI: [49%, 55%].

  • Importance of sample size calculations for determining required sample sizes to achieve desired confidence levels and margin of error.

Choosing a Sample Size
  • Estimation methods differing based on prior estimates:

    • When p is known: n=racz2imesp(1p)ME2n = rac{z^2 imes p(1-p)}{ME^2}

    • When p is not known (use conservative): n=racz22ME2n = rac{z^2}{2 ME^2}

  • Always round up in sample size calculations.

  • Example Calculations:

    • For 3% margin of error with 95% confidence, determine appropriate sample size.

Confidence Intervals for Means
  • Example: 2024 Birthweights

    • In 2024, average birthweight was 3,318.9 grams.

  • Sampling Distribution:

    • Central Limit Theorem applies; distribution generated by sample means can be modeled as normal:

    • ar{x} ext{ drawn from } N( ext{Population Mean } ext{μ}, ext{ Population SD } σ)

    • Mean calculated:

    • N(3318.9,60.95)N(3318.9, 60.95)

Confidence Interval for Mean (σ Known)
  • Formula for computing confidence interval:

    • ext{CI} = ar{x} ± z^* rac{σ}{ ext{sqrt}(n)}.

Example: Birthweights for Confidence Interval
  • Most commonly encounter unknown population standard deviation (use sample standard deviation:.

  • t-Distribution Introduction:

    • Derived by William S. Gosset under pseudonym