Distribution of Sample Mean & Central Limit Theorem
Distribution of the Sample Mean
- Why sample means vary
- Each sample is only a subset → different observed values → different sample means (\bar{x}).
- Leads to three guiding questions:
- What is the central/typical value of (\bar{x})?
- What is the variability (spread) of (\bar{x})?
- Does this variability follow a recognizable distributional pattern?
- Symmetry expectation
- For large n the distribution of (\bar{x}) is roughly symmetric → (\bar{x}) falls below and above \mu about 50 % of the time.
- For small n symmetry may fail, but most (\bar{x}) values still cluster near \mu.
- Theoretical guarantee: \mu_{\bar{x}} = \mu (mean of all sample means equals the population mean).
Unbiasedness & the Bull’s-Eye Metaphor
- Unbiased estimator: Expected value of estimator equals the parameter.
- E[\bar{x}] = \mu → \bar{x} is unbiased for \mu.
- Other unbiased estimators mentioned:
- Sample proportion \hat{p} for population proportion p.
- Sample variance s^2 for population variance \sigma^2.
- Bull’s-eye visual
- Target center = true parameter.
- Four scenarios:
- Wide scatter but centered → unbiased, high variance.
- Tight cluster, centered → unbiased, low variance (ideal).
- Wide scatter, shifted center → biased, high variance.
- Tight cluster, shifted center → biased, low variance.
Variability of the Sample Mean (Standard Error)
- Infinite population (or sampling with replacement):
- \sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}}
- Finite population (sampling without replacement):
- \sigma_{\bar{x}} = \sqrt{\dfrac{N - n}{N - 1}} \times \dfrac{\sigma}{\sqrt{n}}
- N = population size, n = sample size.
- Key implication: Increasing n ↓ \sigma_{\bar{x}} → estimates become more precise.
Numerical Visualization Example
- Population parameters: \mu = 43\,660, \sigma = 2\,500.
- Three normal curves for \bar{x}:
- n = 25 → widest, flattest curve (largest standard error).
- n = 100 → intermediate width/height.
- n = 200 → tallest, narrowest curve (smallest standard error).
Desirable Properties of \bar{x}
- Unbiasedness: \mu_{\bar{x}} = \mu.
- Efficiency with n: Larger n ⇒ smaller \sigma_{\bar{x}}.
Central Limit Theorem (CLT)
- Often called the fundamental theorem of statistics.
- Statement (for "sufficiently large" n \ge 30):
- Distribution of \bar{x} is approximately normal, regardless of the population’s shape.
- Center: \mu_{\bar{x}} = \mu.
- Spread: \sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}}.
- Special case: If the population itself is normal, all three properties hold for any n (even n < 30).
Visual Demonstrations of the CLT
- Four population shapes examined:
- Bimodal (two humps).
- Uniform (all values equally likely).
- Exponential (right-skewed, higher likelihood near zero).
- Normal.
- In every scenario, the sampling distribution of \bar{x} becomes (approximately) normal as n \to 30 or larger.
- For the inherently normal population, normality of \bar{x} holds even at small n.
Error of Estimation (Sampling Error)
- Defined as difference between a sample statistic and the corresponding population parameter.
- For means: \text{Error} = \bar{x} - \mu.
- Also called sampling error or estimation error.
- Arises purely because we observe a sample rather than the full population.
- Can be reduced by:
- Increasing n (reduces standard error).
- Using better sampling designs to avoid bias.
These notes cover the distribution, properties, and practical implications of the sample mean, capped by the Central Limit Theorem’s assurance that large-sample means behave normally and predictably.