Central Limit Theorem - Sampling Distribution of Sample Means - Stats & Probability

Central Limit Theorem (CLT)

  • Definition: The Central Limit Theorem states that if we collect samples of size n from a population and calculate the mean of each of those samples, plotting those means on a histogram will approximate a normal distribution.

Key Concepts

  • Given a population of size (e.g., 100,000 people), select samples of size n (e.g., 30).

  • Example of Mean Calculations:

    • Sample 1 Mean Age = 41.8

    • Sample 2 Mean Age = 39.6

    • Sample 3 Mean Age = 40.5

  • When plotted, the means (x̄) from samples will create a distribution that resembles a normal distribution even if the population distribution is not normal.

  • Critical Sample Size:

    • If sample size n ≥ 30, the sampling distribution will approximate a normal distribution regardless of the population distribution shape.

Illustrative Examples

  • Population Distribution: If the population distribution is normal with mean (μ) and a random variable (x), then the sampling distribution of sample means will also be normal.

    • Notation:

    • x = Individual observation from the population

    • x̄ = Mean of a sample collected from the population

  • Alternative Shapes:

    • When population distribution has shapes like uniform or exponential, and sample size (n) is large enough (n ≥ 30), the sampling distribution will still approximate normal.

Sampling Distribution

  • Definition: Probability distribution where sample means are plotted on the x-axis.

  • Utilization of Z-table for probability calculations based on approximate normal distribution.

    • For probability calculating (P(a < x̄ < b), the Z-table can determine the area under the curve.

Essential Symbols and Variables

  • μ = Population mean

  • x̄ = Sample mean

  • μₓ̄ = Mean of the sampling distribution

  • σ = Standard deviation of the population

  • s = Standard deviation of the sample

  • σₓ̄ = Standard deviation of the sampling distribution (standard error)

  • n = Size of the sample

Law of Large Numbers

  • Definition: As the size of the sample (n) increases, the sample mean (x̄) becomes closer to the population mean (μ).

  • Example:

    • Population mean age = 36

    • Sample 1: n = 10 → Mean age 32

    • Sample 2: n = 50 → Mean age 38.5

    • Sample 3: n = 100 → Mean age 35.3

    • Sample 4: n = 1000 → Mean age 36.1

  • Conclusion: As n increases, x̄ approaches true population mean μ.

Z-Score Equations

  • For normal distribution: Z = \frac{x - \mu}{\sigma}

  • For sampling distribution: Z = \frac{\bar{x} - \mu{\bar{x}}}{\sigma{\bar{x}}}

  • Standard deviation of the sampling distribution: \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

  • Important Concept: When n is large, μₓ̄ ≈ μ simplifies Z-equation to:
    Z = \frac{x̄ - \mu}{\sigma / \sqrt{n}}

Effect of Sample Size on Standard Deviation

  • Observation: Increasing sample size (n) decreases the standard error (σₓ̄).

    • \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} (Inverse relationship)

    • Graphical Implication: An increase in sample size leads to a narrower and taller sampling distribution shape.

Formulas for Distribution Evaluations

  • Uniform Distribution: Shape defined by height f(x) = \frac{1}{b - a}

    • Mean: \text{Mean} = \frac{a + b}{2}

    • Standard Deviation: \sigma = \frac{b - a}{\sqrt{12}}

  • Exponential Distribution: Notably starts at y-intercept, characterized by:

    • f(x) = \lambda e^{-\lambda x}

    • Mean = Standard Deviation = \frac{1}{\lambda}

Additional Areas of Consideration

Normal Distribution
  • Population: N(\mu, \sigma)

  • Sampling: N(\mu, \frac{\sigma}{\sqrt{n}})

Probability Calculations for Normal Distribution
  • For probabilities: P(X < a) = P(Z < \frac{X - \mu}{\sigma})

Example Problem Applications

Problem 1
  • Entrance exams have mean μ = 74; σ = 6.8.

  • Part A: Probability of score less than 65 (Z-calculation stepwise).

Problem 2
  • Snack bar carb distribution (uniform)

  • Variables for single distribution and sample mean distributions were calculated accordingly based on CLT.

Problem 3
  • Car lifespan for exponential distribution calculation (λ, standard error connection)

Extensions to Practical Applications
  • Probability estimates and statistical significance are derived from Z-scores based on normal approximations, emphasizing fundamental principles of Central Limit Theorem, enhancing statistical inference validity.

Central Limit Theorem (CLT)

  • Definition: The Central Limit Theorem states that if means of samples of size n are collected from a population and plotted, their distribution will approximate a normal distribution.

Key Concepts
  • The sampling distribution of sample means will approximate a normal distribution even if the population distribution is not normal.

  • This approximation holds true if the sample size n \geq 30 , regardless of the population's original shape.

  • The sampling distribution plots sample means ( \bar{x} ) on the x-axis and is used with Z-tables for probability calculations.

Law of Large Numbers
  • As the sample size ( n ) increases, the sample mean ( \bar{x} ) approaches the population mean ( \mu ).

Essential Formulas
  • Standard Error (Standard deviation of the sampling distribution): \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

  • Z-Score for Sampling Distribution: Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}

  • Effect of Sample Size: Increasing sample size ( n ) decreases the standard error ( \sigma_{\bar{x}} ), leading to a narrower and taller sampling distribution.

Distribution Applications
  • The CLT's principle applies to various population distributions, including uniform and exponential, ensuring the normality of the sampling distribution of means for sufficiently large n .