Lecture 6 Notes: Central Limit Theorem and Applications

Agenda and setup

Module 6: Lecture 6 with PowerPoint and dataset opened by students; two announcements to follow the lecture.
Announcements mentioned:
- HTAP and AT Plus-related items; quiz number two.
Instructor notes on class progress:
- Snapshot taken last night at 9 PM to track who started early/late.
- Currently three students still missing quiz 2.

Central idea of Lecture 6: storyline and terminology

Three-slide content focus:
- Slide 1: Work with a link to an app that generates four graphs (only 1–3 used in class).
- Slide 2: Central Limit Theorem (CLT): formal definition and implications.
- Slide 3: Application to numerical data first, then to categorical data.
Rough storyline: start with a population link, view the app, summarize the theory (CLT), then apply to numerical data and to categorical data.

App walkthrough and what it demonstrates

The app has four graphs (only 1–3 used):
- Graph 1: Population data with mean, median, standard deviation, skewness, and kurtosis.
- Population distribution shown is normal in this example.
- Population mean given as +1 65 (interpreted as 165 earlier or a similar value; in practice they use a normal with mean 16 and SD 5 for demonstration—note: actual values discussed include a normal with mean 16 and SD 5 in the narration).
- Graph 2: Sampling process and the accumulation of sample means (x̄).
- Example with sample size n = 5; five values drawn from the population, then x̄ is calculated and plotted.
- Repeated sampling yields different x̄ values; the app tracks the number of repetitions (wraps).
- Graph 3: Summary of repeated samples; the x̄ distribution begins to resemble a normal curve as sampling continues.
- Graph 4: Not used in class.
Workflow described:
- Start with a population (mean μ, SD σ).
- Click to generate samples of size n; compute x̄ for each sample; observe the distribution of x̄.
- The more samples you draw (e.g., 10,000 at a time), the closer the x̄ distribution gets to normal with a smaller spread.
Key observations from the demo:
- With a normal population (mean μ = 16, SD σ = 5), the x̄ distribution centers near μ and becomes tighter as n increases.
- For a uniform distribution, the x̄ distribution also tends toward normal as sample size grows.
- With a skewed distribution, the x̄ distribution becomes more normal as n grows, but you may need larger n for a perfect normal shape.
- Custom distributions (user-drawn) show the same CLT behavior: as you increase the number of samples, the x̄ distribution centers near the population mean and the spread shrinks.
Important quantitative notes from the demo:
- For the normal population example, n = 5 yields an x̄ distribution with center near μ and a smaller spread; using many samples (e.g., 10,000) yields a very close-to-normal blue x̄ distribution with mean around μ and SE ≈ σ/√n.
- When switching distributions (uniform, skewed, custom), the center of the x̄ distribution remains near μ, and the SE shrinks with n, illustrating the CLT in practice.
- The term "standard error" is introduced as the standard deviation of the sampling distribution of the sample mean:
  SE(ar{X}) = rac{
{ { { { { { { { { { { } } } } } } } } } } } }
- Formal CLT summary from this section: for any population with mean μ and standard deviation σ, the sampling distribution of the mean for samples of size n is approximately Normal with
- mean μ̄ = μ
- standard deviation σ̄ = σ / √n
- i.e.,
  $\bar{X} \sim \mathcal{N}\big(\mu, \frac{\sigma^{2}}{n}\big)$
Notation and terminology:
- The large distribution is denoted Big N (Normal distribution).
- The sample mean is x̄; the population mean is μ; population standard deviation is σ; sample size is n.
- Standard error of the mean is the standard deviation of the sampling distribution of x̄: $SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}.$

Step 3: Rough and formal CLT summary (definition and implications)

Rough summary (from the app):
- The distribution of sample means x̄ is normal, even if the population distribution is messy, provided you have a mean μ and a standard deviation σ, and you sample enough times.
- Center: the mean of the sampling distribution equals the population mean: $\mu_{\bar{X}} = μ.$
- Spread: the standard deviation of the sampling distribution (the standard error) is smaller than the population SD, specifically $\sigma_{\bar{X}} = \frac{σ}{\sqrt{n}}.$
Formal CLT statement (as presented):
- If X̃ is distributed as X with mean μ and SD σ, then the sampling distribution of the sample mean X̄ for samples of size n is approximately Normal with
  $\,\bar{X} \sim \mathcal{N}\left(μ, \frac{σ^{2}}{n}\right).$
- The center equality: $μ_{\bar{X}} = μ.$
- The standard deviation of the sampling distribution (the standard error): $σ_{\bar{X}} = \frac{σ}{\sqrt{n}}.$
- The standard error is the quantity you use for inference about the sample mean when applying the CLT.
Practical guidance on n:
- Conventional classroom rule of thumb: n ≥ 30 for a robust normal approximation when the population distribution is arbitrary.
- In practice, with a normal population, smaller n (even as low as 5) can yield a reasonable normal approximation.
- The instructor emphasizes two memorize-worthy numbers for exams: 30 (threshold for robust use) and a second value to be announced later.
Important caveat discussed:
- If the population is already normal, the sampling distribution of the mean is exactly Normal for any n (not just large n).
- If the population is not normal, larger n improves the normal approximation; CLT does not guarantee perfect normality for small n in non-normal populations.

Numerical data example: insurance claims dataset

Data setup:
- Population (X) is the access payment column with 90 values in the dataset; treated as the population.
- Given statistics (from the instructor’s example):
- Population mean: $μ = 319.2$
- Population SD: $σ = 873.6$
Sample setup:
- Sample size: $n = 36$
- Sampling distribution of the mean: $\bar{X} \,\sim \, \mathcal{N}\left(μ, \frac{σ^{2}}{n}\right)$
- Compute standard error:
  $σ_{\bar{X}} = \frac{σ}{\sqrt{n}} = \frac{873.6}{\sqrt{36}} = \frac{873.6}{6} = 145.6.$
Target probability:
- What is P(\bar{X} > 380)?
- Standardize:
  $Z = \frac{380 - μ}{σ_{\bar{X}}} = \frac{380 - 319.2}{145.6} \approx \frac{60.8}{145.6} \approx 0.42.$
- Therefore: P(\bar{X} > 380) = P(Z > 0.42) \approx 0.34.
How this is interpreted and validated in the lecture:
- The instructor uses Minitab to illustrate the normal approximation for the sampling distribution of the mean.
- In Minitab, set distribution to Normal with mean μ = 319.2 and SE = 145.6; for the right-tail probability P(\bar{X} > 380), input the X value 380 and select the right tail.
Additional numerical exercise shown:
- Example: Probability that the sample mean exceeds 315 (another calculation). If μ = 319.2 and SE ≈ 145.6, Z ≈ (315 - 319.2)/145.6 ≈ -0.029; P(\bar{X} > 315) ≈ 0.511.

Categorical data example: CLT for proportions

Data setup:
- Population proportion p is estimated from a Yes/No categorical variable.
- From the example: Yes count = 51, No = 39 (out of 90), so
  $\pi = p = \frac{51}{90} \approx 0.567.$
Sample setup:
- Sample size: $n = 49$ (chosen because it's a convenient, easily square-rootable number).
- Sample proportion: $\hat{p} \sim \mathcal{N}\left(p, \frac{p(1-p)}{n}\right).$
- Compute the standard error of the proportion:
  $SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.567\times(1-0.567)}{49}}.$
Practical CLT for proportions:
- For the normal approximation to be reasonable, you check the rule of thumb: $np \ge 5 \quad \text{and} \quad n(1-p) \ge 5.$
- In the example: with p ≈ 0.567 and n = 49, both np and n(1-p) satisfy the condition (np ≈ 27.94, n(1-p) ≈ 21.06).
Question worked in class:
- What is the probability that the sample proportion exceeds 0.60, i.e., P(\hat{p} > 0.60)?
- Use the Normal approximation (mean = p, SE = sqrt(p(1-p)/n)) and interpret via the X value approach in Minitab (x value perspective):
- Mean μ_p = p ≈ 0.567; SE ≈ sqrt(0.567×0.433/49) ≈ 0.112 (rounded to two decimals: 0.12).
- In Minitab: right tail for x value corresponding to 0.60.
- Result: P(\hat{p} > 0.60) ≈ 0.40 (approximately 0.40, depending on rounding).

Worked practice examples and in-class polling format (instructor’s method)

Demonstrates in-class random polling, where students answer a question about the CLT:
- Question example: In the insurance example, with 10,007 samples but per-sample size n = 5, what is the value of n? Options were a) 10,007 or b) 5; correct answer is b) 5, since n refers to the number of observations in each sample, not the number of samples.
- The instructor uses a real poll format, calling names (e.g., Palima, Yola) to answer; emphasizes that the two numbers to memorize are 30 (for the general rule) and a second, to be announced.
Additional in-class example (numerical):
- Population mean μ = 0.65; p = 16; sample size n = 32; SE = sqrt(p(1-p)/n) ≈ 0.12; Probability P(p̂ > 0.68) computed via the normal approximation and Minitab as described.
Key takeaway from these exercises:
- The CLT provides a way to approximate the distribution of sample means (or sample proportions) with a Normal distribution regardless of the underlying population, given enough sample size.
- The mean of the sampling distribution equals the population parameter; the spread is reduced by a factor related to the square root of the sample size.

Practical thresholds, caveats, and guidance for exam-ready knowledge

Thresholds and rules of thumb:
- n ≥ 30 is a common rule of thumb for the CLT to ensure a good normal approximation when the population distribution is not normal.
- When the population is already normal, the normal approximation holds well for small n (e.g., n as low as 5).
- The CLT for proportions requires both np ≥ 5 and n(1-p) ≥ 5 to justify normal approximation.
Two memorized values for the semester (as stated by the instructor):
- Value 1: 30 (the usual threshold for numerical data CLT applicability).
- Value 2: The second value will be announced in a future class.
Important terminology recap:
- Population distribution vs. sampling distribution: the former is the distribution of X; the latter is the distribution of X̄ (or p̂).
- Standard error (SE): the standard deviation of the sampling distribution of X̄, given by $SE(\bar{X}) = \frac{σ}{\sqrt{n}}.$ and for proportions, $SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}.$
The bottom line for CLT applicability:
- If you know μ and σ of the population, you can approximate the distribution of the sample mean with a Normal distribution with mean μ and standard deviation σ/√n, provided the sample size is sufficiently large (or the population is normal).
- For categorical data, approximate the binomial with a Normal for p̂ when np and n(1-p) meet the 5-rule.

Announcements and course logistics (two items mentioned in class)

Announcement 1: Reading table updates with new columns
- A new column called AT (and related columns AC/AP) will be added; AT represents absence-related counts. If AT ≥ 3, each additional absence costs 1% from the 10% assignment component; AP and AC should remain as zero to avoid penalties; Sunday 9 PM snapshots will capture those values.
- If you have not engaged with AP/AC tasks, missingness will cost up to 0.5% per absence as a completion-based penalty.
Announcement 2: Quiz 2 policy
- Quiz 2 is the last quiz with extensions allowed; extensions are up to 9 PM. After that, any missing quiz receives zero.
Attendance and check-in code:
- Check-in code: 0305 (example shown in class).
Additional practical notes:
- Instructor encourages frequent email checks for timely responses; individual meetings are available on request (brief sessions, ~15 minutes) to discuss questions.
Exam preparation notes:
- Cheat sheet: allowed; two pages, double-sided; the instructor will explain the format beforehand.
- You will be expected to memorize two numbers (30 and the second to be announced) and the key CLT formulas; other equations may be provided on the cheat sheet.

Quick recap for study and exam prep

CLT core result: For any population with mean μ and SD σ, the sampling distribution of the mean from samples of size n is approximately Normal with
$\bar{X} \sim \mathcal{N}\left(μ, \frac{σ^{2}}{n}\right)$ and
$μ<em>{\bar{X}} = μ, \quad σ</em>{\bar{X}} = \frac{σ}{\sqrt{n}}.$
For proportions:
$\hat{p} \sim \mathcal{N}\left(p, \frac{p(1-p)}{n}\right), \quad SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}.$
Practical thresholds: use n ≥ 30 for general cases; normal population allows smaller n; always check np ≥ 5 and n(1-p) ≥ 5 for proportions.
Real-data examples used in lecture:
- Insurance claims dataset: μ ≈ 319.2, σ ≈ 873.6, n = 36; SE ≈ 145.6; P( X̄ > 380 ) ≈ 0.34.
- Proportions example: p ≈ 0.567, n = 49; SE ≈ sqrt(p(1-p)/n) ≈ 0.12; P( p̂ > 0.60 ) ≈ 0.40.
Exam and class logistics to remember:
- Cheat sheet policy; new attendance columns; quiz 2 due by 9 PM; two numbers to memorize (30 and a second one to be announced).

The Central Limit Theorem (CLT) is a fundamental concept in statistics that explains the behavior of sample means and proportions, even when the original population distribution is unknown or not normal. It provides a crucial bridge for performing statistical inference.

How to Interpret the CLT

The core idea is that if you take many random samples from any population (with a finite mean μ and standard deviation σ), and you calculate the mean (x̄) or proportion (p̂) for each sample, the distribution of these sample means (or proportions) will tend to be Normal, regardless of the original population's shape. This is true as long as your sample size (n) is sufficiently large.

The "Normal Tendency": Even if the population data is skewed, uniform, or has a custom shape, the distribution of its sample means (the sampling distribution of x̄) will start to look like a bell curve (Normal distribution) as you increase the number of observations in each sample.
Center Approximates Population Mean: The mean of this sampling distribution of x̄ will be approximately equal to the population mean (μ). That is, $\mu<em>{\bar{X}} = \mu$ . For proportions, the mean of the sampling distribution of p̂ will be approximately equal to the population proportion (p). That is, $\mu</em>{\hat{p}} = p$ .
Spread Decreases with Sample Size: The standard deviation of this sampling distribution, known as the standard error (SE), will be smaller than the population standard deviation (σ). It decreases as the sample size (n) increases. This means the sample means cluster more tightly around the population mean for larger samples. For numerical data, $SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}$ . For categorical data, $SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}$ .

Visualize It: The App Walkthrough Analogy

Imagine an app with three graphs:

Graph 1 (Population): Shows the distribution of your original population (e.g., a normal distribution with μ=16, σ=5, or a skewed one). This is your starting point.
Graph 2 (Sampling Process): You draw a small sample (e.g., n=5) from Graph 1, calculate its mean (x̄), and plot that single x̄ on a new plot. You repeat this many times. Each repetition adds one x̄ to Graph 2.
Graph 3 (Sampling Distribution): As you continue drawing samples and plotting their x̄ values, Graph 2 starts to form a shape. Critically, no matter what Graph 1 looked like initially, Graph 2 will progressively take on the smooth, bell-like shape of a Normal distribution. The center of this bell curve will align with the mean of Graph 1, and its spread will be narrower, illustrating $SE = \frac{\sigma}{\sqrt{n}}$ . This visual transformation is the CLT in action.

How to Identify and Categorize CLT Applications

Look for scenarios where you are dealing with:

A sample from a larger population (or a process that can be thought of as infinite): The goal is to infer something about the population based on the sample.
Repeated sampling (or conceptual repeated sampling): Even if you only have one sample, the CLT allows you to imagine what the distribution of all possible sample means (or proportions) would look like.
Focus on Sample Statistics: You are interested in the distribution of sample means (x̄) for numerical data or sample proportions (p̂) for categorical data, not the distribution of individual data points.

Categorization Rules for Robust Approximation:

Numerical Data (Sample Means): The CLT states that the sampling distribution of the mean is approximately Normal with $\,\bar{X} \sim \mathcal{N}\big(\mu, \frac{\sigma^{2}}{n}\big)$ . Here's how to apply it:
- Rule of Thumb: For the approximation to be robust when the population is not normal, you generally need a sample size of $n \ge 30$ .
- Special Case: If the population itself is already Normal, the sampling distribution of the mean is exactly Normal for any sample size n (even small ones like $n = 5$ ).
Categorical Data (Sample Proportions): The CLT also applies to sample proportions, where $\,\hat{p} \sim \mathcal{N}\left(p, \frac{p(1-p)}{n}\right)$ . Here's how to apply it:
- Rule of Thumb: To ensure a reasonable Normal approximation for proportions, you need to check two conditions: $np \ge 5 \quad \text{and} \quad n(1-p) \ge 5$ .

Function and Reasoning: Why the CLT Exists

The CLT is essential for statistical inference. In most real-world situations, we cannot measure an entire population; we can only take samples. The CLT allows us to use the properties of the Normal distribution (which are well-understood) to make educated guesses or draw conclusions about unknown population parameters (μ or p) based on our sample statistics (x̄ or p̂). Without the CLT, making inferences from samples to populations would be far more difficult, often requiring specific knowledge of the population's exact distribution, which is usually unavailable.

Connection to Other Relationships

Standard Error (SE): The CLT directly introduces the concept of standard error, which is the cornerstone for constructing confidence intervals (estimating population parameters with a range) and performing hypothesis tests (testing claims about population parameters).
Hypothesis Testing & Confidence Intervals: These advanced statistical techniques rely heavily on the Normal distribution approximation provided by the CLT when dealing with sample means and proportions, enabling us to quantify uncertainty in our estimates.
Big Picture: The CLT shows that while individual samples might vary greatly, the average behavior of many samples is highly predictable and follows a Normal pattern, making statistical analysis robust and powerful.

Imagine you want to know something about a huge group of things, but you can't check every single one. Like wanting to know the average weight of all the candies in a giant jar, but you can't weigh them all. The Central Limit Theorem (CLT) is like a superpower in math that helps us learn about that huge group by just looking at smaller pieces (called 'samples').

What does this 'superpower' tell us?

It Becomes a Bell Curve! No matter how messy or weird the original numbers in your huge group are (like if some candies are super heavy and some are super light), if you keep taking samples (handfuls of candies) and finding their averages, those averages will start to line up in a beautiful, predictable 'bell curve' shape. Think of it like a neatly organized pile of toys, even if the toys themselves are all different.
The Average of Averages is the Real Average! If you take the average of all your sample averages (the average of your handfuls), it will be very, very close to the true average of the entire huge group (all the candies in the giant jar!). We use $\,\mu_{\bar{X}}$ to mean "average of sample averages" and $\,\mu$ for "real average of the big group." They are about equal!
Bigger Samples, Tighter Bell Curve! If you pick bigger handfuls of candies (bigger 'n'), your average numbers will be even closer to the true average. So, the bell curve of your sample averages will look much skinnier and taller – meaning less spread out. The 'standard error' (SE) is a fancy name for how spread out this bell curve of averages is. It gets smaller when 'n' (your sample size) gets bigger. Think of it like a skilled archer: the more practice shots they take, the closer their arrows land to the bullseye every time.
- For numbers (like average height): $SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}$ (where $\sigma$ is how spread out the big group is, and $n$ is your sample size).
- For Yes/No questions (like how many red candies): $SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}$ (where $p$ is the 'Yes' percentage in the big group, and $n$ is your sample size).

See It in Your Mind: The Candy Jar Game!

Imagine an app (like a game) with three graphs:

Graph 1 (The Jar): This shows all the candies in our giant jar. Maybe some candies have big numbers, some have small, some are even zero! It can look messy or organized. This is your starting point – the whole population.
Graph 2 (Your Handful Averages): You reach into the jar, pull out 5 candies (that's your 'n' = 5). You add up their numbers and divide by 5 to get the average for that handful (that's $\,\bar{X}$ ). You put the candies back, mix them up, and do it again and again! Each average number you get goes onto this graph as a dot.
Graph 3 (The Magic Bell Curve): After you've done this many times, you look at all the dots on Graph 2. Wow! Even if Graph 1 was bumpy (the original candies were all over the place), this graph of averages looks like a beautiful, smooth bell curve! The center of this bell curve will be right near the real average of all the candies in the jar, and it will be much narrower than Graph 1. This shrinking and shaping into a bell curve is the Central Limit Theorem working its magic!

How to Spot the CLT in a Problem (Exam Tip!)

When you read a math problem, how do you know if the CLT superpower can help you?

Are you taking a small group (a 'sample') to learn about a BIG group (the 'population')? Like checking a few toy boxes to know about all toys in the store.
Are you interested in the average ( $\,\bar{X}$ ) or the proportion ( $\,\hat{p}$ ) from these samples? Not just looking at a single number, but what happens to the average or percentage if you keep taking samples. For example, the problem might ask, "What's the average height of a sample of 10 kids?" not "What's the height of one kid?"

When can you use the CLT's bell curve magic? (Important rules!)

For Numbers (like average height, average candy weight): If you're looking at the average of numbers ( $\,\bar{X}$ ) from your samples:
- Rule #1 to Remember (very important!): Your sample size ( $n$ ) should be $30$ or more. If you pick at least 30 candies in each handful, the averages will definitely make a nice bell curve.
- Special Trick: If the original big group (Graph 1) already looks like a perfect bell curve, then your sample averages will always make a perfect bell curve, even if your sample size ( $n$ ) is small (like $n = 5$ candies!).
For Yes/No Questions (like proportion of red candies, proportion of 'yes' answers): If you're looking at the percentage ( $\,\hat{p}$ ) of 'Yes' or 'Red' from your samples:
- Rule #2 to Remember (also very important!): You need to check two things:
  - How many 'Yes' answers do you expect? ( $n \times p$ ) This number must be 5 or more.
  - How many 'No' answers do you expect? ( $n \times (1-p)$ ) This number must also be 5 or more.
  - If both are true, then your percentages will make a nice bell curve!

Why is the CLT so awesome? (Its super-purpose!)

The CLT is super important because it helps us tell smart stories about huge groups (populations) even when we only have small bits of information (samples). Imagine trying to guess how many red candies are in the giant jar without counting them all. If you just take one handful, you might guess wrong! But with the CLT, if you take lots of handfuls and find their averages, you can make a really good guess about the whole jar because those averages will follow that predictable bell curve. Without the CLT, it would be like trying to guess the weather for the whole year by only looking outside for one minute!

How CLT connects to other math tricks

Standard Error (SE): This is the 'spread' of your bell curve of averages. It tells you how much your sample averages usually jump around. The CLT helps us figure out this spread so we can be more sure about our guesses.
Making Smart Guesses (Confidence Intervals & Hypothesis Tests): Because the CLT makes a bell curve out of our sample averages/percentages, we can use that bell curve to make really good guesses about the true average or percentage of the big group. We can say things like, 'I'm 95% sure the true average candy weight is between 5 grams and 7 grams!'
The Big Idea! Even though each little handful of candies might be different, the averages of many handfuls act in a very steady, predictable way. This makes our math superpowers really strong for understanding big groups without checking every single thing!