Study Notes on Statistical Inference Foundations

Chapter 5: Foundations for Inference

5.1 Point Estimates and Sampling Variability

5.1.1 Point Estimates and Error
  • Context: Statistical inference quantifies uncertainty about population parameters using sample data.

  • Point estimate: A sample statistic used to estimate an unknown parameter.

    • Example: A poll indicates the US President’s approval rating is 45%, which serves as a point estimate for the true parameter of interest denoted as p.

  • Sample proportion (p̂): Denoted as p̂ (pronounced p-hat), used as an estimate of p.

  • Error in estimate: The difference between the point estimate and the true population parameter p, consisting of:

    • Sampling error (sampling uncertainty): Variation from sample to sample, impacting estimates.

    • Bias: Systematic over or underestimation due to flawed sampling methods.

    • Minimizing bias is crucial for accurate estimates, as discussed in Chapter 1.

5.1.2 Understanding the Variability of a Point Estimate
  • Consider p = 0.88 as the true proportion supporting solar energy in the US.

  • If 1,000 American adults are polled, the goal is to understand how close p̂ in different samples (e.g., p̂ = 0.894) will be to p.

  • Simulation approach for analysis:

    1. Simulate a population of 250 million individuals where 88% support solar energy.

    2. Randomly sample 1000 individuals and compute p̂.

  • Example simulated outputs:

    • p̂1 = 0.894 (error = +0.014)

    • Example: p stated as a decimal, p = 0.88 is equivalent to 88%.

  • The sampling distribution characterizes the behavior of p̂, evaluated through multiple simulations leading to a histogram representation described by:

    • Center: Mean of sampling distribution = p = 0.88.

    • Spread: Standard error SEp̂ is calculated as sp̂ = 0.010.

    • Shape: Normal-shaped distribution when conditions met.

5.1.3 Central Limit Theorem (CLT)
  • Central Limit Theorem: If sample sizes are large and observations are independent, the sampling distribution of p̂ is approximately normal with:

    • Mean (µp̂) = p

    • Standard Error (SEp̂) = √[p(1 - p)/n]

    • Conditions for normality:

    • np >= 10

    • n(1 - p) >= 10

  • Example Verification:

    • n = 1000, p = 0.88 verifies both conditions:

    • np = 880 >= 10

    • n(1 − p) = 120 >= 10

5.1.4 Applying the CLT to Real Scenarios
  • Using p̂ = 0.887, researchers use the sample proportion to approximate p and verify the success-failure condition by replacing p with p̂.

  • Standard error: SEp̂ = √[p̂(1-p̂)/n] = 0.010.

5.1.5 Extending Framework for Other Statistics
  • This sampling principle applies broadly to other statistics, not just proportions.

    • Example: Estimating population mean (μ) via sample mean (x̄).

5.2 Confidence Intervals for a Proportion

5.2.1 Capturing the Population Parameter
  • A confidence interval (CI) is a range estimative method that more accurately captures population values than point estimates alone, regarding them as a better “fishing net” rather than a spear for hitting exact values.

  • 95% CI example: p̂ ± 1.96 × SEp̂.

5.2.2 Constructing a 95% CI
  • Procedure for CI construction:

    1. Confirm conditions for normal approximation of p̂.

    2. Point estimate (p̂) is the central value.

    3. Calculate the margin of error (z-score based on confidence level times standard error).

  • Interpretation of CI: If repeated sampling is performed, approximately 95% of these intervals would contain the true parameter.

5.2.3 Changing the Confidence Level
  • Varying the confidence level alters the width of the CI: higher confidence requires a wider interval (e.g., 99% CI - z = 2.58).

5.2.4 More Case Studies
  • Examples discussing real-world surveys applied with CIs for estimating parameters.

5.2.5 Interpreting Confidence Intervals
  • Certainty claims refer explicitly to population parameters, not individual estimates. CIs are focused on population dynamics rather than point precision.

5.3 Hypothesis Testing for a Proportion

5.3.1 Hypothesis Testing Framework
  • Hypotheses are formulated as:

    • Null Hypothesis (H0): usually skeptical; e.g., p = 0.333.

    • Alternative Hypothesis (HA): suggests a deviation; e.g., p ≠ 0.333.

5.3.2 Testing Hypotheses Using CIs
  • Use hypothesis testing principles to evaluate predictions against observed sample data.

5.3.3 Decision Errors
  • Types of errors:

    • Type 1 Error: Rejecting H0 when it’s true.

    • Type 2 Error: Failing to reject H0 when HA is true.

5.3.4 Formal Testing Using P-Values
  • The p-value quantifies evidence against H0:

    • Smaller p-values indicate greater evidence against H0.

5.3.5 Choosing a Significance Level
  • Establishing α levels (0.05 as traditional) guides testing rigor based on application context.

5.3.6 Statistical vs. Practical Significance
  • Recognizing the difference in contextual relevance in statistical findings and their implications.

5.3.7 One-Sided Hypothesis Tests
  • Variations in hypothesis test types (one-sided tests) focus solely on directionality of deviations versus bi-directional.

6.1 Inference for a Single Proportion

6.1.1 Identifying when the Sample Proportion is Nearly Normal
  • Conditions: Independence of observations, and the sample size larger than the threshold determined by the success-failure condition.

6.1.2 Confidence Intervals for a Proportion
  • Application of CI methods similar to proportions including the steps of preparation, checking, calculating, and concluding.

6.1.3 Hypothesis Testing for a Proportion
  • Procedures intertwined throughout the sections focused on methods, ensuring accurate statistical inferences.

6.1.4 When Conditions Aren’t Met
  • Adjustments for confidence intervals under improper conditions; alternative methods or simulations may be utilized to align valid results.

6.1.5 Sample Size Considerations
  • Sampling size calculations directly influence the quality of CI and hypothesis test outcomes; estimated error influenced by chosen sample sizes.

6.2 Difference of Two Proportions

6.2.1 Sampling Distribution of Differenced Proportions
  • Strategies to evaluate independence conditions and success-failure checks among group observations.

6.2.2 Confidence Intervals for Difference of Two Proportions
  • CI formulations expanded to encapsulate differences between population groups, enriching statistical analyses.

6.2.3 Hypothesis Tests for the Difference of Two Proportions
  • Workflows extending earlier CI approaches focused on comparison across segments of population data.

6.2.4 More on 2-Proportion Hypothesis Tests
  • Addressed unique cases for hypothesis testing with operational comparison of differing proportions.

6.2.5 Evaluating the Standard Error Formula
  • Discussions delve into the derivations for understanding standard errors across differing sample models.