The Central Limit Theorem (CLT) is a fundamental concept in statistics that describes how sample means behave.
Definition: The sampling distribution refers to the distribution of sample statistics, such as the mean, derived from multiple samples taken from a population.
The CLT states that:
For large samples (size n), the means of these samples will approximately follow a normal distribution, regardless of the population's distribution.
Key elements:
Drawn from a population with known mean (μ) and standard deviation (σ).
As the sample size n increases, the histogram of sample means trends toward a normal bell shape.
Important Note:
The population distribution does not need to be known.
A sample size of at least 30 is typically seen as "large enough."
The adequacy of sample size (n) for applying the CLT depends on the underlying population distribution:
If original population is normal, smaller n may suffice.
If unknown or non-normal, n should be at least 30.
Sample size n=50, μ=45, σ=8: Can CLT be applied? Yes.
Sample size n=10: Can CLT be applied? No.
Sample size n=50 (normal distribution): Can CLT be applied? Yes.
Students will use CLT properties to estimate the means and standard deviations of sampling distributions from sample means.
If X is a random variable, its mean (μX) and standard deviation (σX) apply:
As n increases, the distribution of sample means becomes normally distributed.
Normal distribution symbol: ~ N(μX, σX/√n).
σX/√n is termed the Standard Error of the Mean (SEM).
Definition: Variability observed in sample statistics due to random sampling.
"Error" denotes variability, not mistakes.
When studying behavioral issues in children, variability occurs between different samples due to randomness in selected subjects.
One sample may contain predominantly well-behaved children, while another may show higher instances of behavior problems.
Conducting 10,000 samples and recording means produces a distribution of means with variability, showing a range of sample averages due to chance.
Majority of sample means will cluster around the true population mean (45-55), indicating consistent representation.
Scenario: Researching game strategies for 29-35 year-olds based on average gamer age.
Given mean age of strategy players is 28 (SD = 4.8), with a sample of 100 players showing a probability of 0.0186 for ages 29-35.
Question: Is the development strategy viable? Needs analysis of probability outcome.
Scenario: Cola beverage claims 16 ounces.
Sample n=34, sample mean = 16.01, μ = 16.00, σ = 0.143.
Questions:
Do results indicate cans are filled over 16 ounces?
Feelings from consumer and manufacturer perspectives?
Data: Females aged 18-24 have average systolic BP of 114.8 (SD = 13.1).
Sample of 40 females, probability mean BP > 120 is 0.3457.
Questions:
Interpret the probability outcome.
If using a sample of 4 females and distribution is unknown, can CLT be applied?
Answer: No, insufficient sample size.