M4 5 Central Limit Theorem

Central Limit Theorem Overview

  • If X is a normally distributed random variable, sample means will also be normally distributed.

  • Sample means will have the same mean as the population but a smaller standard deviation, calculated as:

    • [ \sigma_x = \frac{\sigma}{\sqrt{n}} ]

    • As the sample size (n) increases, the standard deviation of sample means shrinks, improving the estimate of the overall mean.

Key Insights

  • The powerful aspect of the central limit theorem (CLT): It applies to any random variable, provided the sample size is large enough (cutoff usually at n \geq 30).

  • Regardless of the original distribution's shape (flat, uniform, skewed), the sampling distribution will approach normality as sample size increases.

  • This allows the application of normal distribution properties for probability calculations and statistical inferences.

Case Study: Standard Normal Distribution

  • Consider a standard normal distribution with mean = 0 and standard deviation = 1.

  • Taking samples of size one means each sample is identical to the values obtained from the population, creating a graph resembling a bell curve.

  • Observations from empirical data:

    • Mean of the sampling distribution is close to zero but not exactly zero due to randomness.

    • The standard deviation remains close to one.

Impact of Increasing Sample Size

  • Sample Size = 2:

    • Observations shift inward; distributions begin clustering towards the mean.

    • Tail values reduce in occurrence; for far outlier values, they require one significantly rare sample event.

  • Sample Size = 3:

    • Further clustering begins to occur around the mean, making far outlier occurrences increasingly rare.

  • Sample Size = 5 and beyond:

    • Continuation of squishing inward and increasing height of the distribution.

    • Distributions maintain a normal shape, illustrating CLT's effectiveness.

Calculation of Standard Deviation for Sample Sizes

  • For a sample size of 25,

    • Expected standard deviation: [ \sigma_x = \frac{1}{\sqrt{25}} = 0.2 ]

    • Empirical sampling distribution's standard deviation closely matches 0.19.

Non-Normal Distributions

  • Transition to Uniform Distribution

    • Flat shape; values range from 0 to 9, equally likely.

    • Sample size increases to 2 to observe averages producing middle-range values, leading to initial clustering.

  • Further increasing sample size:

    • At size = 5: More clustering occurs with continued averaging.

    • At size = 15: Distribution begins resembling normal distribution; size = 30 shows clear bell curve.

Exponential Distribution Analysis

  • Exponential distributions tend to have higher probabilities for lower values and extremely low probabilities for larger values (right-skewed).

  • Sample size increase to 5 and beyond shows clustering around lower values; increasing sizes yield results generally approximating normal distribution.

  • Best approximations occur around sample sizes of 30 or more, improving normality in shape.

Significance of CLT in Real-World Applications

  • Central Limit Theorem allows for estimations of means in populations with unknown distributions (e.g., weights of animals).

  • By taking a large enough sample of weights, the average can help establish reasonable population mean estimates and assess probability based on normal distribution.

  • This principle is foundational for confidence intervals and hypothesis testing, topics forthcoming in the course.

robot