Sampling Distributions

Sampling Distributions

Course Overview

  • Course Title: Biostatistics 521: Applied Biostatistics

  • Instructor: Mousumi Banerjee

Introduction to Statistical Inference

  • Objective: Introduce the concept of statistical inference and sampling distributions.

  • Statistics Computed:

    • Example: Sample mean 0X = estimate of population mean 000

    • Importance of understanding estimates and their variability.

  • Key Questions: How far off is the sample mean from the true population mean? What factors influence this?

    • Factors influencing estimate variability include:

    • Sample size (n)

    • Population distribution.

  • Focus of Today: Quantify variability in an estimator using the standard error.

Understanding Population Parameters

  • Key Measures: Population Mean (μ) and Population Standard Deviation (σ).

    • Goals: To determine specific population parameters such as:

    • Height of all females in the US.

    • Systolic Blood Pressure (SBP) of African American males over 60 years old in the US.

    • Income of UM MPH graduates over the last 5 years.

Sample Estimates

  • Definitions: Sample Estimates:

    • Sample Mean 0X

    • Sample Standard Deviation (S)

  • Estimation Processes:

    • Taking a sample from a population allows for estimation of population parameters.

    • Point estimates (single valued estimates of parameters) can vary between different samples.

Variability in Sampling

  • Sample Estimates Variability:

    • Sampling variability leads to different point estimates from the same population sample taken at different times.

    • Illustrative Example:

      • If multiple samples are taken, point estimates of means and standard deviations will vary,

      • Some point estimates may differ significantly by chance.

Sampling Distribution Characteristics

  • Concept of Sampling Distribution:

    • The distribution of sample means forms a bell-shaped curve and is centered around the true population mean (μ = 7777, σ = 2222).

  • Implication of Sampling:

    • When drawing a single sample, you receive one estimate from this larger distribution.

  • Statistical Inference Essence:

    • Point estimate as a random variable derived from a defined sampling distribution.

    • Sampling distribution influenced by:

    • Original population distribution.

    • Sample size.

    • Estimator used (e.g., mean 0X).

Effects of Sample Size on Variability

  • Sampling Distribution for Means: Examination of the behavior of sample means derived from different sizes N (N = 25, N = 50, etc.).

  • Effect of Larger Samples:

    • As the sample size (n) increases, the variance of the sampling distribution of 0X decreases, leading to more precise estimates of the population mean.

Variability Relative to Population

  • Influence of Population Distribution:

    • A population with smaller variability (lower standard deviation) will yield a smaller standard deviation in the sampling distribution of 0X.

Point Estimates and Standard Errors

  • Definitions:

    • Population Parameter: A value we wish to estimate (e.g., mean μ).

    • Point Estimate: Statistic calculated from the data providing the best guess of the population parameter (e.g., 0X).

    • Sampling Distribution: Distribution of point estimates based on repeated sampling.

    • Standard Error (SE): The standard deviation of the sampling distribution, quantifying how much the point estimator differs from the population parameter.

Importance of Standard Error

  • Critical Role of SE:

    • Understanding SE is vital for the interpretation of the precision of a point estimate.

    • Case Study: Evaluating the sex ratio of newborns with a small sample size.

  • Lower SE indicates less variability in point estimates, providing higher confidence in the estimate's closeness to the true population mean.

Computing Standard Error of Sample Mean

  • Random Sampling Models:

    • Consider independent observations: 0X1, 0X2, …, 0Xn drawn from populations with known mean μ and standard deviation σ.

    • Sample Mean Formula: 0X = (1/n) ∑ Xᵢ

    • Variance of Sample Mean:
      V(X)=V(1/nXi)=(1/n2)(nσ2)=σ2/nV(X) = V(1/n ∑Xᵢ) = (1/n²)(nσ²) = σ²/n

    • Standard Error Formula:
      SE=racσextnSE = rac{σ}{ ext{√n}}

Properties of Standard Error

  • Standard Error Must Be Estimated:

    • Typically, we estimate σ with sample standard deviation (s). Thus, standard error becomes:
      SEext(est.)=racsextnSE ext{(est.)} = rac{s}{ ext{√n}}

  • Effects on SE:

    • As population standard deviation (σ) decreases, SE decreases.

    • As sample size (n) increases, SE decreases.

Thought Experiments to Illustrate SE

  • Contextual Scenarios:

    • Explore how changes in sample size and variability affect standard error.

    • Example scenarios visualize computations and illustrate how SE varies.

Practical Example: BRFSS Data Analysis

  • Overview of Data:

    • Mean and SD of various health-related factors from 20,000 samples:

    • Age: Mean = 45.1, SD = 17.2, SE = 0.122.

    • Height: Mean = 67.2, SD = 4.1, SE ≈ 0.029.

    • Weight: Mean = 169.7, SD = 40, SE ≈ 0.283.

Air Pollution Case Study in Ann Arbor

  • Specific Measurement of PM2.5:

    • Sample size = 25, Mean = 28.4, Standard Deviation = 4.72, Calculated Standard Error = 0.944.

  • Contrast Between Standard Deviation and Standard Error:

    • Standard deviation indicates the variability in measurements, while standard error provides insight into the estimate precision of the mean.

Summary of Key Ideas

  • Role of Point Estimators:

    • Point estimators vary and distribute around the population mean.

    • Standard error quantifies uncertainty and lies at the core of statistical inference.

    • Our forthcoming sessions will build upon estimating μ using 0X and its standard error.