Study Notes on Statistical Inference Foundations

Chapter 5: Foundations for Inference

5.1 Point Estimates and Sampling Variability

5.1.1 Point Estimates and Error

Context: Statistical inference quantifies uncertainty about population parameters using sample data.
Point estimate: A sample statistic used to estimate an unknown parameter.
- Example: A poll indicates the US President’s approval rating is 45%, which serves as a point estimate for the true parameter of interest denoted as p.
Sample proportion (p̂): Denoted as p̂ (pronounced p-hat), used as an estimate of p.
Error in estimate: The difference between the point estimate and the true population parameter p, consisting of:
- Sampling error (sampling uncertainty): Variation from sample to sample, impacting estimates.
- Bias: Systematic over or underestimation due to flawed sampling methods.
- Minimizing bias is crucial for accurate estimates, as discussed in Chapter 1.

5.1.2 Understanding the Variability of a Point Estimate

Consider p = 0.88 as the true proportion supporting solar energy in the US.
If 1,000 American adults are polled, the goal is to understand how close p̂ in different samples (e.g., p̂ = 0.894) will be to p.
Simulation approach for analysis:
1. Simulate a population of 250 million individuals where 88% support solar energy.
2. Randomly sample 1000 individuals and compute p̂.
Example simulated outputs:
- p̂1 = 0.894 (error = +0.014)
- Example: p stated as a decimal, p = 0.88 is equivalent to 88%.
The sampling distribution characterizes the behavior of p̂, evaluated through multiple simulations leading to a histogram representation described by:
- Center: Mean of sampling distribution = p = 0.88.
- Spread: Standard error SEp̂ is calculated as sp̂ = 0.010.
- Shape: Normal-shaped distribution when conditions met.

5.1.3 Central Limit Theorem (CLT)

Central Limit Theorem: If sample sizes are large and observations are independent, the sampling distribution of p̂ is approximately normal with:
- Mean (µp̂) = p
- Standard Error (SEp̂) = √[p(1 - p)/n]
- Conditions for normality:
- np >= 10
- n(1 - p) >= 10
Example Verification:
- n = 1000, p = 0.88 verifies both conditions:
- np = 880 >= 10
- n(1 − p) = 120 >= 10

5.1.4 Applying the CLT to Real Scenarios

Using p̂ = 0.887, researchers use the sample proportion to approximate p and verify the success-failure condition by replacing p with p̂.
Standard error: SEp̂ = √[p̂(1-p̂)/n] = 0.010.

5.1.5 Extending Framework for Other Statistics

This sampling principle applies broadly to other statistics, not just proportions.
- Example: Estimating population mean (μ) via sample mean (x̄).

5.2 Confidence Intervals for a Proportion

5.2.1 Capturing the Population Parameter

A confidence interval (CI) is a range estimative method that more accurately captures population values than point estimates alone, regarding them as a better “fishing net” rather than a spear for hitting exact values.
95% CI example: p̂ ± 1.96 × SEp̂.

5.2.2 Constructing a 95% CI

Procedure for CI construction:
1. Confirm conditions for normal approximation of p̂.
2. Point estimate (p̂) is the central value.
3. Calculate the margin of error (z-score based on confidence level times standard error).
Interpretation of CI: If repeated sampling is performed, approximately 95% of these intervals would contain the true parameter.

5.2.3 Changing the Confidence Level

Varying the confidence level alters the width of the CI: higher confidence requires a wider interval (e.g., 99% CI - z = 2.58).

5.2.4 More Case Studies

Examples discussing real-world surveys applied with CIs for estimating parameters.

5.2.5 Interpreting Confidence Intervals

Certainty claims refer explicitly to population parameters, not individual estimates. CIs are focused on population dynamics rather than point precision.

5.3 Hypothesis Testing for a Proportion

5.3.1 Hypothesis Testing Framework

Hypotheses are formulated as:
- Null Hypothesis (H0): usually skeptical; e.g., p = 0.333.
- Alternative Hypothesis (HA): suggests a deviation; e.g., p ≠ 0.333.

5.3.2 Testing Hypotheses Using CIs

Use hypothesis testing principles to evaluate predictions against observed sample data.

5.3.3 Decision Errors

Types of errors:
- Type 1 Error: Rejecting H0 when it’s true.
- Type 2 Error: Failing to reject H0 when HA is true.

5.3.4 Formal Testing Using P-Values

The p-value quantifies evidence against H0:
- Smaller p-values indicate greater evidence against H0.

5.3.5 Choosing a Significance Level

Establishing α levels (0.05 as traditional) guides testing rigor based on application context.

5.3.6 Statistical vs. Practical Significance

Recognizing the difference in contextual relevance in statistical findings and their implications.

5.3.7 One-Sided Hypothesis Tests

Variations in hypothesis test types (one-sided tests) focus solely on directionality of deviations versus bi-directional.

6.1 Inference for a Single Proportion

6.1.1 Identifying when the Sample Proportion is Nearly Normal

Conditions: Independence of observations, and the sample size larger than the threshold determined by the success-failure condition.

6.1.2 Confidence Intervals for a Proportion

Application of CI methods similar to proportions including the steps of preparation, checking, calculating, and concluding.

6.1.3 Hypothesis Testing for a Proportion

Procedures intertwined throughout the sections focused on methods, ensuring accurate statistical inferences.

6.1.4 When Conditions Aren’t Met

Adjustments for confidence intervals under improper conditions; alternative methods or simulations may be utilized to align valid results.

6.1.5 Sample Size Considerations

Sampling size calculations directly influence the quality of CI and hypothesis test outcomes; estimated error influenced by chosen sample sizes.

6.2 Difference of Two Proportions

6.2.1 Sampling Distribution of Differenced Proportions

Strategies to evaluate independence conditions and success-failure checks among group observations.

6.2.2 Confidence Intervals for Difference of Two Proportions

CI formulations expanded to encapsulate differences between population groups, enriching statistical analyses.

6.2.3 Hypothesis Tests for the Difference of Two Proportions

Workflows extending earlier CI approaches focused on comparison across segments of population data.

6.2.4 More on 2-Proportion Hypothesis Tests

Addressed unique cases for hypothesis testing with operational comparison of differing proportions.

6.2.5 Evaluating the Standard Error Formula

Discussions delve into the derivations for understanding standard errors across differing sample models.