Central Limit Theorem - Sampling Distribution of Sample Means - Stats & Probability
Central Limit Theorem (CLT)
Definition: The Central Limit Theorem states that if we collect samples of size n from a population and calculate the mean of each of those samples, plotting those means on a histogram will approximate a normal distribution.
Key Concepts
Given a population of size (e.g., 100,000 people), select samples of size n (e.g., 30).
Example of Mean Calculations:
Sample 1 Mean Age = 41.8
Sample 2 Mean Age = 39.6
Sample 3 Mean Age = 40.5
When plotted, the means (x̄) from samples will create a distribution that resembles a normal distribution even if the population distribution is not normal.
Critical Sample Size:
If sample size n ≥ 30, the sampling distribution will approximate a normal distribution regardless of the population distribution shape.
Illustrative Examples
Population Distribution: If the population distribution is normal with mean (μ) and a random variable (x), then the sampling distribution of sample means will also be normal.
Notation:
x = Individual observation from the population
x̄ = Mean of a sample collected from the population
Alternative Shapes:
When population distribution has shapes like uniform or exponential, and sample size (n) is large enough (n ≥ 30), the sampling distribution will still approximate normal.
Sampling Distribution
Definition: Probability distribution where sample means are plotted on the x-axis.
Utilization of Z-table for probability calculations based on approximate normal distribution.
For probability calculating (P(a < x̄ < b), the Z-table can determine the area under the curve.
Essential Symbols and Variables
μ = Population mean
x̄ = Sample mean
μₓ̄ = Mean of the sampling distribution
σ = Standard deviation of the population
s = Standard deviation of the sample
σₓ̄ = Standard deviation of the sampling distribution (standard error)
n = Size of the sample
Law of Large Numbers
Definition: As the size of the sample (n) increases, the sample mean (x̄) becomes closer to the population mean (μ).
Example:
Population mean age = 36
Sample 1: n = 10 → Mean age 32
Sample 2: n = 50 → Mean age 38.5
Sample 3: n = 100 → Mean age 35.3
Sample 4: n = 1000 → Mean age 36.1
Conclusion: As n increases, x̄ approaches true population mean μ.
Z-Score Equations
For normal distribution: Z = \frac{x - \mu}{\sigma}
For sampling distribution: Z = \frac{\bar{x} - \mu{\bar{x}}}{\sigma{\bar{x}}}
Standard deviation of the sampling distribution: \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
Important Concept: When n is large, μₓ̄ ≈ μ simplifies Z-equation to:
Z = \frac{x̄ - \mu}{\sigma / \sqrt{n}}
Effect of Sample Size on Standard Deviation
Observation: Increasing sample size (n) decreases the standard error (σₓ̄).
\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} (Inverse relationship)
Graphical Implication: An increase in sample size leads to a narrower and taller sampling distribution shape.
Formulas for Distribution Evaluations
Uniform Distribution: Shape defined by height f(x) = \frac{1}{b - a}
Mean: \text{Mean} = \frac{a + b}{2}
Standard Deviation: \sigma = \frac{b - a}{\sqrt{12}}
Exponential Distribution: Notably starts at y-intercept, characterized by:
f(x) = \lambda e^{-\lambda x}
Mean = Standard Deviation = \frac{1}{\lambda}
Additional Areas of Consideration
Normal Distribution
Population: N(\mu, \sigma)
Sampling: N(\mu, \frac{\sigma}{\sqrt{n}})
Probability Calculations for Normal Distribution
For probabilities: P(X < a) = P(Z < \frac{X - \mu}{\sigma})
Example Problem Applications
Problem 1
Entrance exams have mean μ = 74; σ = 6.8.
Part A: Probability of score less than 65 (Z-calculation stepwise).
Problem 2
Snack bar carb distribution (uniform)
Variables for single distribution and sample mean distributions were calculated accordingly based on CLT.
Problem 3
Car lifespan for exponential distribution calculation (λ, standard error connection)
Extensions to Practical Applications
Probability estimates and statistical significance are derived from Z-scores based on normal approximations, emphasizing fundamental principles of Central Limit Theorem, enhancing statistical inference validity.
Central Limit Theorem (CLT)
Definition: The Central Limit Theorem states that if means of samples of size n are collected from a population and plotted, their distribution will approximate a normal distribution.
Key Concepts
The sampling distribution of sample means will approximate a normal distribution even if the population distribution is not normal.
This approximation holds true if the sample size n \geq 30 , regardless of the population's original shape.
The sampling distribution plots sample means ( \bar{x} ) on the x-axis and is used with Z-tables for probability calculations.
Law of Large Numbers
As the sample size ( n ) increases, the sample mean ( \bar{x} ) approaches the population mean ( \mu ).
Essential Formulas
Standard Error (Standard deviation of the sampling distribution): \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
Z-Score for Sampling Distribution: Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}
Effect of Sample Size: Increasing sample size ( n ) decreases the standard error ( \sigma_{\bar{x}} ), leading to a narrower and taller sampling distribution.
Distribution Applications
The CLT's principle applies to various population distributions, including uniform and exponential, ensuring the normality of the sampling distribution of means for sufficiently large n .