M4 5 Central Limit Theorem

Central Limit Theorem Overview

If X is a normally distributed random variable, sample means will also be normally distributed.
Sample means will have the same mean as the population but a smaller standard deviation, calculated as:
- [ \sigma_x = \frac{\sigma}{\sqrt{n}} ]
- As the sample size (n) increases, the standard deviation of sample means shrinks, improving the estimate of the overall mean.

The powerful aspect of the central limit theorem (CLT): It applies to any random variable, provided the sample size is large enough (cutoff usually at n \geq 30).
Regardless of the original distribution's shape (flat, uniform, skewed), the sampling distribution will approach normality as sample size increases.
This allows the application of normal distribution properties for probability calculations and statistical inferences.

Consider a standard normal distribution with mean = 0 and standard deviation = 1.
Taking samples of size one means each sample is identical to the values obtained from the population, creating a graph resembling a bell curve.
Observations from empirical data:
- Mean of the sampling distribution is close to zero but not exactly zero due to randomness.
- The standard deviation remains close to one.

Sample Size = 2:
- Observations shift inward; distributions begin clustering towards the mean.
- Tail values reduce in occurrence; for far outlier values, they require one significantly rare sample event.
Sample Size = 3:
- Further clustering begins to occur around the mean, making far outlier occurrences increasingly rare.
Sample Size = 5 and beyond:
- Continuation of squishing inward and increasing height of the distribution.
- Distributions maintain a normal shape, illustrating CLT's effectiveness.

For a sample size of 25,
- Expected standard deviation: [ \sigma_x = \frac{1}{\sqrt{25}} = 0.2 ]
- Empirical sampling distribution's standard deviation closely matches 0.19.

Transition to Uniform Distribution
- Flat shape; values range from 0 to 9, equally likely.
- Sample size increases to 2 to observe averages producing middle-range values, leading to initial clustering.
Further increasing sample size:
- At size = 5: More clustering occurs with continued averaging.
- At size = 15: Distribution begins resembling normal distribution; size = 30 shows clear bell curve.

Exponential distributions tend to have higher probabilities for lower values and extremely low probabilities for larger values (right-skewed).
Sample size increase to 5 and beyond shows clustering around lower values; increasing sizes yield results generally approximating normal distribution.
Best approximations occur around sample sizes of 30 or more, improving normality in shape.

Central Limit Theorem allows for estimations of means in populations with unknown distributions (e.g., weights of animals).
By taking a large enough sample of weights, the average can help establish reasonable population mean estimates and assess probability based on normal distribution.
This principle is foundational for confidence intervals and hypothesis testing, topics forthcoming in the course.