Chapter 7: Sampling Distributions and the Central Limit Theorem
Chapter 7: Sampling Distribution - Core Concepts
This material is based on the textbook Statistics: The Art and Science of Learning from Data, 5th edition, by Agresti & Franklin. It covers how statistics collected from samples vary and the theoretical distributions that describe this variation.
1. How Sample Proportions Vary Around the Population Proportion
When conducting an exit poll or survey, we often want to know if the sample proportion is a reliable estimate of the total population proportion. The sampling distribution is the tool used to determine how close a sample proportion is likely to fall to the true population parameter.
Definition: Sampling Distribution
A sampling distribution is a specific type of probability distribution. It is constructed by considering all possible distinct samples of a fixed size that could be taken from a population. For each sample, the statistic (such as a proportion) is recorded. The frequency distribution of these values across all possible samples forms the sampling distribution.
Example 1: Election Exit Polls
Imagine an election where Candidate A runs against Candidate B.
A sample of voters is taken to estimate the winner.
For this specific sample, the proportion of those who voted for candidate A is recorded.
If you were to repeat this process for every possible distinct sample of voters, each sample would yield a different proportion value for Candidate A.
The collection of these values and their frequency constitutes the sampling distribution of the sample proportion.
1.1 Describing the Sampling Distribution of a Sample Proportion
For a sampling distribution of a sample proportion, the descriptive measures (mean and standard deviation) are determined by the sample size and the population proportion .
Mathematical Properties
For a random sample of size from a population with a proportion of outcomes in a specific category, the sampling distribution of the sample proportion in that category has the following properties:
Mean: The mean of the sampling distribution is equal to the population proportion. *
Standard Deviation: The standard deviation (often called the standard error) measures the spread of the sample proportions. *
Shape of the Sampling Distribution
The shape of the distribution is governed by the Central Limit Theorem. The sampling distribution of a sample proportion is approximately normal provided the sample size is sufficiently large. The thresholds for "sufficiently large" are:
Example 2: Calculation Practice
Given a population proportion and a sample size , find the parameters of the sampling distribution:
Example 3: Baseball Batting Averages
A major league baseball player typically has about at-bats (opportunities to hit) in a single season. Suppose a player has a true probability of of getting a hit in any single at-bat.
Batting Average Definition: The batting average is the total number of hits divided by the total number of at-bats. This is fundamentally a sample proportion.
(a) Descriptive parameters for and : * Mean: * Standard Deviation: * Shape: Since (which is ) and (which is ), the shape is approximately normal.
(b) Comparative Analysis: A batting average of or would not be considered especially unusual for this player because of the natural variation described by the sampling distribution. One should not conclude a player hitting one year is definitively a better hitter than one hitting , as both could be common fluctuations from a true mean of .
2. How Sample Means Vary Around the Population Mean
Statistical analysis often focuses on the behavior of the sample mean and how much it deviates from the population mean .
2.1 Describing the Sampling Distribution of a Sample Mean
For a random sample of size drawn from a population with a mean and a standard deviation , the sampling distribution of the sample mean is characterized by:
Mean: The center of the sampling distribution is the same as the population mean. *
Standard Deviation (Standard Error): The spread of the sample means decreases as the sample size increases. *
Shape of the Sampling Distribution for Means
The shape of the distribution depends on the original population distribution and the sample size:
Normal Population: If the original population distribution is normal, the sampling distribution of the sample mean will be approximately normal regardless of the sample size.
The Central Limit Theorem (CLT): If the population distribution is not normal (e.g., skewed), the sampling distribution of the sample mean still approaches a normal distribution as the sample size increases. In practice, an is typically considered sufficient for the sampling distribution to be approximately normal.
Example 4: Education Levels
According to Recent Current Population Reports, the number of years of education for self-employed individuals in the U.S. has a mean of and a standard deviation of .
(a) Random Variable Identification: The random variable represents the number of years of education for a single self-employed individual in the United States.
(b) Sampling Distribution Parameters (): * *
Example 5: Restaurant Business Analysis
A restaurant charges customers a flat rate of per meal. The management calculates that the expense per customer (based on food consumption and labor) follows a distribution that is skewed to the right with a mean of and a standard deviation of .
(a) Parameters for customers: * If the customers constitute a random sample, the mean of the sampling distribution for expense per customer is . * The standard deviation of the sampling distribution is .
(b) Certainty Interval: Management can provide an interval (typically within 3 standard deviations) in which it is almost certain the sample mean will fall.
(c) Profit Probability: Calculate the probability that the restaurant makes a profit by finding the likelihood that the sample mean expense is less than the revenue per meal of .