Chapter 7: Central Limit Theorem Study Notes

Chapter 7: Central Limit Theorem

Overview of Central Limit Theorem (CLT)

  • One of the primary goals in statistics is to approximate parameters that describe a population, such as averages or proportions.

    • Example of Parameter:

    • Average height of a population.

    • Sampling involves using a sample to estimate the population parameter.

  • The Central Limit Theorem (CLT) helps us understand how close sample estimates are to the true population parameter.

  • The significance of CLT:

    • The averages from sufficiently large samples will approximate the actual population parameter closely.

Importance of Sampling

  • By taking samples of a predetermined size, we can make educated guesses about population characteristics, such as:

    • Average height nationally.

    • Proportion of people with certain hair colors.

    • Voting intentions in upcoming elections.

Distribution of Sample Means

  • When we take multiple samples of size n (denoted as $n$), we examine their means.

  • If multiple random samples are taken:

    • For example:

    • Sample 1 mean = 25

    • Sample 2 mean = 26

    • Sample 3 mean = 24

    • Sample 4 mean = 25

    • The means from these samples will form a distribution.

  • This distribution of sample means will approximately follow a normal distribution (bell curve).

    • The average ($ar{X}$) of this distribution equals the population mean ($ ext{μ}$).

    • The more samples taken, the closer the means will approach the population mean.

Standard Deviation of Sample Means

  • Each distribution also has a standard deviation which is crucial for assessing how dispersed the sample means are.

  • The standard deviation of this distribution ($ ext{σ}{ar{X}}$) can be calculated using the formula: ext{σ}{ar{X}} = rac{ ext{σ}}{ ext{√}n}

    • Where:

    • $ ext{σ}$ = standard deviation of the population,

    • $n$ = sample size.

Example of Distribution of Sample Means

  • Example Values:

    • Population mean (μ) = 12,

    • Population standard deviation (σ) = 2.

  • Consider a sample size of $100$:

    • If you want to find the probability of obtaining a sample mean greater than $12.5$, calculate:

    • The standard deviation of the sample distribution:

    • ext{σ}_{ar{X}} = rac{2}{ ext{√}100} = rac{2}{10} = 0.2

    • Define the z-score:

    • z = rac{12.5 - 12}{0.2} = 2.5

  • Using the z-score table:

    • Corresponding cumulative probability = $0.9938$

    • Thus, the area to the right (probability for exceeding $12.5$) = $1 - 0.9938 = 0.0062$, or $0.62\%$.

    • Interpretation: suggests that there’s a low chance (0.62%) of obtaining a sample mean of 12.5 when the actual mean is 12.

Practicing without Knowing Population Parameters

  • If parameters like mean and standard deviation are unknown, we will need to consider how to estimate using samples effectively.

  • Central Limit Theorem reassures us that probabilities remain stable even with variations in the sample means.

Distribution of Sample Proportions

  • Similar to the distribution of sample means, we can analyze the distribution of sample proportions.

  • Example:

    • Let's say the proportion of people with brown hair in the population is $p = 0.42$.

  • For a sample size of $n = 1000$, we can calculate:

    • Distribution mean for sample proportions = $p = 0.42$.

    • Standard deviation of this distribution is calculated with the formula:
      ext{σ}_{ar{p}} = ext{√} rac{p(1-p)}{n}

  • Plugging the numbers:

    • ext{σ}_{ar{p}} = ext{√} rac{0.42 imes (1 - 0.42)}{1000} = 0.0156

Probability of Sample Proportions

  • If asked for the probability of obtaining a sample proportion of $40\%$ or less:

    • Define the z-score:

    • Mean of the distribution = $0.42$, standard deviation = $0.0156$:

    • Convert $40\%$ to decimal = $0.40$.

    • Calculate z-score:

    • z = rac{0.40 - 0.42}{0.0156} ≈ -1.28

  • From the z-score table, this corresponds to a cumulative probability of $0.1003$.

    • Therefore, the probability of getting $40\%$ or less is approximately $10 ext{.}03 ext{\%}$.

Conclusion on CLT Application

  • Central Limit Theorem allows statisticians to make reliable predictions about population parameters based on sample data.

  • The concepts of sample means and proportions are foundational in understanding how to make inferences in statistics, helping interpret polling data and various surveys effectively.

  • Regularly, estimates from sample sizes can represent public opinion near the true population percentage, reinforcing statistical conclusions.

  • Awareness of how sample size influences statistical reliability is critical, particularly in practical applications like surveys or polls that utilize a sample size of $1000$ for percentage estimates, often with a margin of error of $ ext{±3 ext{\%}}$ dependent on Central Limit Theorem principles.