Distribution of Sample Mean & Central Limit Theorem

Distribution of the Sample Mean

Why sample means vary
- Each sample is only a subset → different observed values → different sample means $(\bar{x})$ .
- Leads to three guiding questions:
1. What is the central/typical value of $(\bar{x})$ ?
2. What is the variability (spread) of $(\bar{x})$ ?
3. Does this variability follow a recognizable distributional pattern?
Symmetry expectation
- For large $n$ the distribution of $(\bar{x})$ is roughly symmetric → $(\bar{x})$ falls below and above $\mu$ about 50 % of the time.
- For small $n$ symmetry may fail, but most $(\bar{x})$ values still cluster near $\mu$ .
Theoretical guarantee: $\mu_{\bar{x}} = \mu$ (mean of all sample means equals the population mean).

Unbiasedness & the Bull’s-Eye Metaphor

Unbiased estimator: Expected value of estimator equals the parameter.
- $E[\bar{x}] = \mu$ → $\bar{x}$ is unbiased for $\mu$ .
- Other unbiased estimators mentioned:
- Sample proportion $\hat{p}$ for population proportion $p$ .
- Sample variance $s^2$ for population variance $\sigma^2$ .
Bull’s-eye visual
- Target center = true parameter.
- Four scenarios:
1. Wide scatter but centered → unbiased, high variance.
2. Tight cluster, centered → unbiased, low variance (ideal).
3. Wide scatter, shifted center → biased, high variance.
4. Tight cluster, shifted center → biased, low variance.

Variability of the Sample Mean (Standard Error)

Infinite population (or sampling with replacement):
- $\sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}}$
Finite population (sampling without replacement):
- $\sigma_{\bar{x}} = \sqrt{\dfrac{N - n}{N - 1}} \times \dfrac{\sigma}{\sqrt{n}}$
- $N$ = population size, $n$ = sample size.
Key implication: Increasing $n$ ↓ $\sigma_{\bar{x}}$ → estimates become more precise.

Numerical Visualization Example

Population parameters: $\mu = 43\,660$ , $\sigma = 2\,500$ .
Three normal curves for $\bar{x}$ :
- $n = 25$ → widest, flattest curve (largest standard error).
- $n = 100$ → intermediate width/height.
- $n = 200$ → tallest, narrowest curve (smallest standard error).

Desirable Properties of $\bar{x}$

Unbiasedness: $\mu_{\bar{x}} = \mu$ .
Efficiency with $n$ : Larger $n$ ⇒ smaller $\sigma_{\bar{x}}$ .

Central Limit Theorem (CLT)

Often called the fundamental theorem of statistics.
Statement (for "sufficiently large" $n \ge 30$ ):
1. Distribution of $\bar{x}$ is approximately normal, regardless of the population’s shape.
2. Center: $\mu_{\bar{x}} = \mu$ .
3. Spread: $\sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}}$ .
Special case: If the population itself is normal, all three properties hold for any $n$ (even n < 30).

Visual Demonstrations of the CLT

Four population shapes examined:
1. Bimodal (two humps).
2. Uniform (all values equally likely).
3. Exponential (right-skewed, higher likelihood near zero).
4. Normal.
In every scenario, the sampling distribution of $\bar{x}$ becomes (approximately) normal as $n \to 30$ or larger.
- For the inherently normal population, normality of $\bar{x}$ holds even at small $n$ .

Error of Estimation (Sampling Error)

Defined as difference between a sample statistic and the corresponding population parameter.
- For means: $\text{Error} = \bar{x} - \mu$ .
- Also called sampling error or estimation error.
Arises purely because we observe a sample rather than the full population.
Can be reduced by:
- Increasing $n$ (reduces standard error).
- Using better sampling designs to avoid bias.

These notes cover the distribution, properties, and practical implications of the sample mean, capped by the Central Limit Theorem’s assurance that large-sample means behave normally and predictably.

Distribution of Sample Mean & Central Limit Theorem

Distribution of the Sample Mean

Unbiasedness & the Bull’s-Eye Metaphor

Variability of the Sample Mean (Standard Error)

Numerical Visualization Example

Desirable Properties of xˉ\bar{x}xˉ

Central Limit Theorem (CLT)

Visual Demonstrations of the CLT

Error of Estimation (Sampling Error)

Desirable Properties of $\bar{x}$