Sampling Variability and Sampling Distributions

Chapter 8: Sampling Variability and Sampling Distributions

Introduction to Sampling

We often want to estimate a population parameter (e.g., mean fat content $m$ of hamburgers).
We take a sample of size $n$ (e.g., $n = 50$ hamburgers) and calculate a sample statistic (e.g., sample mean).
Key questions:
- Is the sample mean a good estimate of $m$ ?
- How close is the sample mean to $m$ ?
- Will other samples of $n = 50$ have the same sample mean?

Sampling Distribution

The sampling distribution describes the long-run behavior of a sample statistic.
The sample mean (denoted as $\bar{x}$ ) is a statistic.

Statistics and Sampling Variability _

Statistic: A number computed from sample data.
Examples of statistics:
- $\bar{x}$ - sample mean
- $s$ - sample standard deviation
- $\hat{p}$ - sample proportion
The value of a statistic depends on the specific sample selected.
Sampling Variability: The variability of a statistic from sample to sample.

Illustrative Example: Fish Pond

Consider a pond with 20 fish.
Fish lengths (in inches): 4.5, 5.4, 10.3, 7.9, 8.5, 6.6, 11.7, 8.9, 2.2, 9.8, 6.3, 4.3, 9.6, 8.7, 13.3, 4.6, 10.7, 13.4, 7.7, 5.6
True population mean: $\,mu = 8$
Example Samples:
- Sample 1: 6.3, 2.2, 13.3 inches; $\bar{x} = 7.27$ inches
- Sample 2: 8.5, 4.6, 5.6 inches; $\bar{x} = 6.23$ inches
- Sample 3: 10.3, 8.9, 13.4 inches; $\bar{x} = 10.87$ inches
Demonstrates that sample means vary (sampling variability).
Some sample means are closer to the true mean, some are farther; some are above, and some are below the true mean.

Sampling Distribution of $\bar{x}$ (Fish Pond Continued)

There are 1140 ( $<em>{20}C</em>3$ ) possible samples of size 3 from the fish population.
If we calculate the mean length of each possible sample, we would have the sampling distribution of $\bar{x}$ .

Definition: Sampling Distributions of $\bar{x}$

The distribution formed by considering the value of a sample statistic (like $\bar{x}$ ) for every possible sample of a given size from a population.

Fish Pond Revisited (Smaller Population)

Simplified scenario: Only 5 fish in the pond.
Lengths: 6.6, 11.7, 8.9, 2.2, 9.8
Population mean: $\,mu_x = 7.84$
Population standard deviation: $\,sigma_x = 3.262$
We keep the population size small to find all possible samples.

Sampling Distribution with Samples of Size 2

Consider all possible pairs (samples of size 2) from the 5 fish.
Possible pairs and their means $\bar{x}$ :
- 6.6 & 11.7: $\bar{x} = 9.15$
- 6.6 & 8.9: $\bar{x} = 7.75$
- 6.6 & 2.2: $\bar{x} = 4.4$
- 6.6 & 9.8: $\bar{x} = 8.2$
- 11.7 & 8.9: $\bar{x} = 10.3$
- 11.7 & 2.2: $\bar{x} = 6.95$
- 11.7 & 9.8: $\bar{x} = 10.75$
- 8.9 & 2.2: $\bar{x} = 5.55$
- 8.9 & 9.8: $\bar{x} = 9.35$
- 2.2 & 9.8: $\bar{x} = 6$
There are 10 possible samples.
Mean of sample means: $\,mu_{\bar{x}} = 7.84$
Standard deviation of sample means: $\,sigma_{\bar{x}} = 1.998$
These values define the sampling distribution of $\bar{x}$ for samples of size 2.
The mean of the sampling distribution equals the population mean.

Sampling Distribution with Samples of Size 3

Consider all possible triplets (samples of size 3) from the 5 fish.
Possible triplets and their means $\bar{x}$ :
- 6.6, 11.7, 8.9: $\bar{x} = 9.067$
- 6.6, 11.7, 2.2: $\bar{x} = 6.833$
- 6.6, 11.7, 9.8: $\bar{x} = 9.367$
- 6.6, 8.9, 2.2: $\bar{x} = 5.9$
- 6.6, 8.9, 9.8: $\bar{x} = 8.433$
- 6.6, 2.2, 9.8: $\bar{x} = 6.2$
- 11.7, 8.9, 2.2: $\bar{x} = 7.6$
- 11.7, 8.9, 9.8: $\bar{x} = 10.133$
- 11.7, 2.2, 9.8: $\bar{x} = 7.9$
- 8.9, 2.2, 9.8: $\bar{x} = 6.967$
There are 10 possible samples.
Mean of sample means: $\,mu_{\bar{x}} = 7.84$
Standard deviation of sample means: $\,sigma_{\bar{x}} = 1.332$
These values determine the sampling distribution of $\bar{x}$ for samples of size 3.

Key Observations

The mean of the sampling distribution EQUALS the mean of the population: $\,mu_{\bar{x}} = \mu$
As the sample size ( $n$ ) increases, the standard deviation of the sampling distribution decreases: $\,sigma_{\bar{x}}$ decreases as $n$ increases.

General Properties of Sampling Distributions of $\bar{x}$

Rule 1: The mean of the sampling distribution of $\bar{x}$ is equal to the population mean: $\,mu_{\bar{x}} = \mu$
Rule 2: The standard deviation of the sampling distribution of $\bar{x}$ is given by:
- $\,sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$
- This is exact for infinite populations.
- Approximately correct if the population is finite and the sample size is no more than 10% of the population.
- (The fish pond examples didn't satisfy this condition, so the formula wasn't accurate there.)

Example: Platelet Volume

Study: “Mean Platelet Volume in Patients with Metabolic Syndrome and Its Relationship with Coronary Artery Disease” (Thrombosis Research, 2007).
Platelet volume of patients without metabolic syndrome is approximately normal.
Population mean: $\,mu = 8.25$
Population standard deviation: $\,sigma = 0.75$
Minitab was used to generate 500 random samples for each of the following sample sizes: $n = 5, 10, 20, 30$ .
Density histograms of the 500 sample means were created.
Observations:
- The means of the histograms are approximately the population mean $\,mu = 8.25$ .
- The standard deviation of the histograms decreases as $n$ increases (consistent with Rule 2).
- The shape of the histograms is approximately normal (consistent with Rule 3).

Rule 3: Normality

When the population distribution is normal, the sampling distribution of $\bar{x}$ is also normal for any sample size $n$ .

Example: NHL Overtime Game Length

Study: “Is the Overtime Period in an NHL Game Long Enough?” (American Statistician, 2008).
Data: Time (in minutes) from the start of the game to the first goal scored in overtime for 281 games.
The distribution is strongly positively skewed.
Population mean: $\,mu = 13$ minutes.
Population median: 10 minutes.
Minitab was used to generate 500 samples of sizes $n = 5, 10, 20, 30$ .
Observations:
- Histograms are centered approximately at $\,mu = 13$ .
- Standard deviations decrease as $n$ increases.
- Shapes of histograms become more normal as $n$ increases, even though the population is skewed.

Rule 4: Central Limit Theorem (CLT)

When $n$ is sufficiently large, the sampling distribution of $\bar{x}$ is well approximated by a normal curve, even when the population distribution is not normal.
A common guideline: CLT can be applied if n > 30.

Example: Soft-Drink Bottler

Claim: Cans contain an average of 12 oz of soda.
Let $x$ = actual volume of soda in a randomly selected can.
$x$ is normally distributed with $\,sigma = 0.16$ oz.
$n = 16$ cans are randomly selected, and $\bar{x}$ is calculated.
If the claim is correct, the sampling distribution of $\bar{x}$ is normal with:
- Mean: $\,mu_{\bar{x}} = \mu = 12$ oz
- Standard Deviation: $\,sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{0.16}{\sqrt{16}} = 0.04$ oz

Soda Problem Continued

What is P(11.96 < \bar{x} < 12.08)?
Standardize the endpoints:
- $z_1 = \frac{11.96 - 12}{0.04} = -1$
- $z_2 = \frac{12.08 - 12}{0.04} = 2$
P(-1 < Z < 2) = P(Z < 2) - P(Z < -1) = 0.9772 - 0.1587 = 0.8185

Example: Hot Dog Manufacturer

Claim: Average fat content of 18 grams per hot dog with $\,sigma = 1$ gram.
Consumers would be unhappy if the mean exceeded 18 grams.
An independent testing organization analyzes a random sample of $n = 36$ hot dogs.
Suppose the sample mean is $\bar{x} = 18.4$ grams. Does this suggest the claim is incorrect?
Because n > 30, the Central Limit Theorem applies, and the distribution of $\bar{x}$ is approximately normal.

Hot Dogs Continued

The mean is $\,mu = 18$
The standard deviation = $\,sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{1}{\sqrt{36}} = \frac{1}{6} = 0.167$
P(\bar{x} > 18.4) = P(Z > \frac{18.4 - 18}{0.167}) = P(Z > 2.40) = 1 - 0.9918 = 0.0082
Values of $\bar{x}$ at least as large as 18.4 would be observed only about 0.82% of the time if the claim were true.
The sample mean of 18.4 is large enough the claim might be incorrect.

Sampling Distribution of $\hat{p}$

Illustrative experiment: Toss a penny 20 times, record the number of heads, and calculate the sample proportion of heads.
Mark the proportion on a dot plot.
Repeat many times to create a partial graph of the sampling distribution of sample proportions ( $\,hat{p}$ ).
This is a statistic!
What would happen if we flipped the penny 50 times?

Definition: Sampling Distribution of $\hat{p}$

The distribution formed by considering the value of the sample statistic $\,hat{p}$ for every possible sample of a given size from a population.
$\,hat{p}$ represents the sample proportion.

Example: Students

Population: Six students (Alice, Ben, Charles, Denise, Edward, & Frank).
Parameter of interest: Proportion of females.
Population proportion of females: $\,frac{1}{3}$
Select samples of two from this population.
Number of possible samples: $<em>{6}C</em>2=15$

Finding All Possible Samples and Sample Proportions

List all 15 possible samples and the sample proportion of females in each:
- Alice & Ben: $\,hat{p} = 0.5$
- Alice & Charles: $\,hat{p} = 0.5$
- Alice & Denise: $\,hat{p} = 1$
- Alice & Edward: $\,hat{p} = 0.5$
- Alice & Frank: $\,hat{p} = 0.5$
- Ben & Charles: $\,hat{p} = 0$
- Ben & Denise: $\,hat{p} = 0.5$
- Ben & Edward: $\,hat{p} = 0$
- Ben & Frank: $\,hat{p} = 0$
- Charles & Denise: $\,hat{p} = 0.5$
- Charles & Edward: $\,hat{p} = 0$
- Charles & Frank: $\,hat{p} = 0$
- Denise & Edward: $\,hat{p} = 0.5$
- Denise & Frank: $\,hat{p} = 0.5$
- Edward & Frank: $\,hat{p} = 0$
Calculate the mean and standard deviation of these sample proportions.
How does the mean of the sampling distribution compare to the population parameter ( $p$ )?

General Properties for Sampling Distributions of $\hat{p}$

Rule 1: $\,mu_{\hat{p}} = p$
Rule 2: $\,sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}$
- This rule is exact if the population is infinite.
- This rule is approximately correct if the population is finite and no more than 10% of the population is included in the sample.

Example: Cal Poly Students

Fall 2008: 18,516 students enrolled at Cal Poly SLO.
8091 (43.7%) were female ( $p = 0.437$ ).
Statistical software simulates sampling from this population.
500 samples are generated for each of the following sample sizes: $n = 10, 25, 50, 100$ .
Histograms display the distribution of sample proportions for each sample size.

Cal Poly Students Continued

The histograms are centered around the true proportion ( $p = 0.437$ ).
What do you notice about the standard deviation of these distributions?
What about the shape of these distributions?

Example: Viral Hepatitis after Blood Transfusions

Study reported that hepatitis occurs in 7% of patients who receive blood transfusions during heart surgery ( $p = 0.07$ ).
Simulate sampling from a population of blood recipients.
Generate 500 samples for each of the following sample sizes: $n = 10, 25, 50, 100$ .
Histograms show sample proportion distributions for each sample size.

Blood Transfusions Continued

The histograms are centered around the true proportion ( $p = 0.07$ ).
What happens to the shape of these histograms as the sample size increases?

Rule 3: Normality for Proportions

When $n$ is large and $p$ is not too near 0 or 1, the sampling distribution of $\,hat{p}$ is approximately normal.
The further $p$ is from 0.5, the larger $n$ must be for the sampling distribution of $\,hat{p}$ to be approximately normal.
A conservative rule of thumb: If np > 10 and n(1 - p) > 10, then a normal distribution provides a reasonable approximation to the sampling distribution of $\,hat{p}$ .

Blood Transfusions Revisited

$p$ = proportion of patients who contract hepatitis after a blood transfusion = 0.07
Suppose a new blood screening procedure is believed to reduce the incidence rate of hepatitis.
Blood screened using this procedure is given to $n = 200$ blood recipients.
Only 6 of the 200 patients contract hepatitis ( $\hat{p} = \frac{6}{200} = 0.03$ ).
Does this indicate that the true proportion is less than 7%?
To answer this, consider the sampling distribution of $\,hat{p}$ .

Checking Conditions for Normality

First, is the sampling distribution approximately normal?
Check the conditions:
- np = 200(0.07) = 14 > 10
- n(1 - p) = 200(0.93) = 186 > 10
Yes, we can use a normal approximation.

Calculating the Probability

Assume the screening procedure is not effective and $p = 0.07$ .
Calculate P(\,hat{p} < 0.03).
The Standard deviation is $\,sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.07(1-0.07)}{200}} = 0.018$
The Z score is $\frac{0.03-0.07}{0.018} = -2.22$
Then P(Z<-2.22) = 0.0132
This small probability tells us that it is unlikely that a sample proportion of 0.03 or smaller would be observed if the screening procedure was ineffective.
This screening procedure appears to yield a smaller incidence rate for hepatitis.

Sampling Variability and Sampling Distributions

Chapter 8: Sampling Variability and Sampling Distributions

Introduction to Sampling

Sampling Distribution

Statistics and Sampling Variability _

Illustrative Example: Fish Pond

Sampling Distribution of xˉ\bar{x}xˉ (Fish Pond Continued)

Definition: Sampling Distributions of xˉ\bar{x}xˉ

Fish Pond Revisited (Smaller Population)

Sampling Distribution with Samples of Size 2

Sampling Distribution with Samples of Size 3

Key Observations

General Properties of Sampling Distributions of xˉ\bar{x}xˉ

Example: Platelet Volume

Rule 3: Normality

Example: NHL Overtime Game Length

Rule 4: Central Limit Theorem (CLT)

Example: Soft-Drink Bottler

Soda Problem Continued

Example: Hot Dog Manufacturer

Hot Dogs Continued

Sampling Distribution of p^\hat{p}p^​

Definition: Sampling Distribution of p^\hat{p}p^​

Example: Students

Finding All Possible Samples and Sample Proportions

General Properties for Sampling Distributions of p^\hat{p}p^​

Example: Cal Poly Students

Cal Poly Students Continued

Example: Viral Hepatitis after Blood Transfusions

Blood Transfusions Continued

Rule 3: Normality for Proportions

Blood Transfusions Revisited

Checking Conditions for Normality

Calculating the Probability

Sampling Distribution of $\bar{x}$ (Fish Pond Continued)

Definition: Sampling Distributions of $\bar{x}$

General Properties of Sampling Distributions of $\bar{x}$

Sampling Distribution of $\hat{p}$

Definition: Sampling Distribution of $\hat{p}$

General Properties for Sampling Distributions of $\hat{p}$