Sampling Distributions
Sampling Distributions
Course Overview
Course Title: Biostatistics 521: Applied Biostatistics
Instructor: Mousumi Banerjee
Introduction to Statistical Inference
Objective: Introduce the concept of statistical inference and sampling distributions.
Statistics Computed:
Example: Sample mean 0X = estimate of population mean 000
Importance of understanding estimates and their variability.
Key Questions: How far off is the sample mean from the true population mean? What factors influence this?
Factors influencing estimate variability include:
Sample size (n)
Population distribution.
Focus of Today: Quantify variability in an estimator using the standard error.
Understanding Population Parameters
Key Measures: Population Mean (μ) and Population Standard Deviation (σ).
Goals: To determine specific population parameters such as:
Height of all females in the US.
Systolic Blood Pressure (SBP) of African American males over 60 years old in the US.
Income of UM MPH graduates over the last 5 years.
Sample Estimates
Definitions: Sample Estimates:
Sample Mean 0X
Sample Standard Deviation (S)
Estimation Processes:
Taking a sample from a population allows for estimation of population parameters.
Point estimates (single valued estimates of parameters) can vary between different samples.
Variability in Sampling
Sample Estimates Variability:
Sampling variability leads to different point estimates from the same population sample taken at different times.
Illustrative Example:
If multiple samples are taken, point estimates of means and standard deviations will vary,
Some point estimates may differ significantly by chance.
Sampling Distribution Characteristics
Concept of Sampling Distribution:
The distribution of sample means forms a bell-shaped curve and is centered around the true population mean (μ = 7777, σ = 2222).
Implication of Sampling:
When drawing a single sample, you receive one estimate from this larger distribution.
Statistical Inference Essence:
Point estimate as a random variable derived from a defined sampling distribution.
Sampling distribution influenced by:
Original population distribution.
Sample size.
Estimator used (e.g., mean 0X).
Effects of Sample Size on Variability
Sampling Distribution for Means: Examination of the behavior of sample means derived from different sizes N (N = 25, N = 50, etc.).
Effect of Larger Samples:
As the sample size (n) increases, the variance of the sampling distribution of 0X decreases, leading to more precise estimates of the population mean.
Variability Relative to Population
Influence of Population Distribution:
A population with smaller variability (lower standard deviation) will yield a smaller standard deviation in the sampling distribution of 0X.
Point Estimates and Standard Errors
Definitions:
Population Parameter: A value we wish to estimate (e.g., mean μ).
Point Estimate: Statistic calculated from the data providing the best guess of the population parameter (e.g., 0X).
Sampling Distribution: Distribution of point estimates based on repeated sampling.
Standard Error (SE): The standard deviation of the sampling distribution, quantifying how much the point estimator differs from the population parameter.
Importance of Standard Error
Critical Role of SE:
Understanding SE is vital for the interpretation of the precision of a point estimate.
Case Study: Evaluating the sex ratio of newborns with a small sample size.
Lower SE indicates less variability in point estimates, providing higher confidence in the estimate's closeness to the true population mean.
Computing Standard Error of Sample Mean
Random Sampling Models:
Consider independent observations: 0X1, 0X2, …, 0Xn drawn from populations with known mean μ and standard deviation σ.
Sample Mean Formula: 0X = (1/n) ∑ Xᵢ
Variance of Sample Mean:
Standard Error Formula:
Properties of Standard Error
Standard Error Must Be Estimated:
Typically, we estimate σ with sample standard deviation (s). Thus, standard error becomes:
Effects on SE:
As population standard deviation (σ) decreases, SE decreases.
As sample size (n) increases, SE decreases.
Thought Experiments to Illustrate SE
Contextual Scenarios:
Explore how changes in sample size and variability affect standard error.
Example scenarios visualize computations and illustrate how SE varies.
Practical Example: BRFSS Data Analysis
Overview of Data:
Mean and SD of various health-related factors from 20,000 samples:
Age: Mean = 45.1, SD = 17.2, SE = 0.122.
Height: Mean = 67.2, SD = 4.1, SE ≈ 0.029.
Weight: Mean = 169.7, SD = 40, SE ≈ 0.283.
Air Pollution Case Study in Ann Arbor
Specific Measurement of PM2.5:
Sample size = 25, Mean = 28.4, Standard Deviation = 4.72, Calculated Standard Error = 0.944.
Contrast Between Standard Deviation and Standard Error:
Standard deviation indicates the variability in measurements, while standard error provides insight into the estimate precision of the mean.
Summary of Key Ideas
Role of Point Estimators:
Point estimators vary and distribute around the population mean.
Standard error quantifies uncertainty and lies at the core of statistical inference.
Our forthcoming sessions will build upon estimating μ using 0X and its standard error.