Chapter 7: Central Limit Theorem Study Notes

One of the primary goals in statistics is to approximate parameters that describe a population, such as averages or proportions.
- Example of Parameter:
- Average height of a population.
- Sampling involves using a sample to estimate the population parameter.
The Central Limit Theorem (CLT) helps us understand how close sample estimates are to the true population parameter.
The significance of CLT:
- The averages from sufficiently large samples will approximate the actual population parameter closely.

By taking samples of a predetermined size, we can make educated guesses about population characteristics, such as:
- Average height nationally.
- Proportion of people with certain hair colors.
- Voting intentions in upcoming elections.

When we take multiple samples of size n (denoted as $n$), we examine their means.
If multiple random samples are taken:
- For example:
- Sample 1 mean = 25
- Sample 2 mean = 26
- Sample 3 mean = 24
- Sample 4 mean = 25
- The means from these samples will form a distribution.
This distribution of sample means will approximately follow a normal distribution (bell curve).
- The average ($ar{X}$) of this distribution equals the population mean ($ ext{μ}$).
- The more samples taken, the closer the means will approach the population mean.

Each distribution also has a standard deviation which is crucial for assessing how dispersed the sample means are.
The standard deviation of this distribution ($ ext{σ}{ar{X}}$) can be calculated using the formula: $ext{σ}</em>{\bar{X}} = rac{ ext{σ}}{ ext{√}n}$
- Where:
- $ ext{σ}$ = standard deviation of the population,
- $n$ = sample size.

Example Values:
- Population mean (μ) = 12,
- Population standard deviation (σ) = 2.
Consider a sample size of $100$:
- If you want to find the probability of obtaining a sample mean greater than $12.5$, calculate:
- The standard deviation of the sample distribution:
- $ext{σ}_{\bar{X}} = rac{2}{ ext{√}100} = rac{2}{10} = 0.2$
- Define the z-score:
- $z = rac{12.5 - 12}{0.2} = 2.5$
Using the z-score table:
- Corresponding cumulative probability = $0.9938$
- Thus, the area to the right (probability for exceeding $12.5$) = $1 - 0.9938 = 0.0062$, or $0.62\%$.
- Interpretation: suggests that there’s a low chance (0.62%) of obtaining a sample mean of 12.5 when the actual mean is 12.

If parameters like mean and standard deviation are unknown, we will need to consider how to estimate using samples effectively.
Central Limit Theorem reassures us that probabilities remain stable even with variations in the sample means.

Similar to the distribution of sample means, we can analyze the distribution of sample proportions.
Example:
- Let's say the proportion of people with brown hair in the population is $p = 0.42$.
For a sample size of $n = 1000$, we can calculate:
- Distribution mean for sample proportions = $p = 0.42$.
- Standard deviation of this distribution is calculated with the formula:
  $ext{σ}_{\bar{p}} = ext{√} rac{p(1-p)}{n}$
Plugging the numbers:
- $ext{σ}_{\bar{p}} = ext{√} rac{0.42 imes (1 - 0.42)}{1000} = 0.0156$

If asked for the probability of obtaining a sample proportion of $40\%$ or less:
- Define the z-score:
- Mean of the distribution = $0.42$, standard deviation = $0.0156$:
- Convert $40\%$ to decimal = $0.40$.
- Calculate z-score:
- $z = rac{0.40 - 0.42}{0.0156} ≈ -1.28$
From the z-score table, this corresponds to a cumulative probability of $0.1003$.
- Therefore, the probability of getting $40\%$ or less is approximately $10 ext{.}03 ext{\%}$.

Central Limit Theorem allows statisticians to make reliable predictions about population parameters based on sample data.
The concepts of sample means and proportions are foundational in understanding how to make inferences in statistics, helping interpret polling data and various surveys effectively.
Regularly, estimates from sample sizes can represent public opinion near the true population percentage, reinforcing statistical conclusions.
Awareness of how sample size influences statistical reliability is critical, particularly in practical applications like surveys or polls that utilize a sample size of $1000$ for percentage estimates, often with a margin of error of $ ext{±3 ext{\%}}$ dependent on Central Limit Theorem principles.