Central Limit Theorem for Proportions - Comprehensive Notes

Central Limit Theorem for Proportions

The proportion of a population with a certain characteristic is the population proportion, denoted by P. Central to statistical inference and hypothesis testing.

In a simple random sample of N individuals, let X be the number in the sample with the characteristic.
The sample proportion, denoted by \hat{P}, is calculated as \hat{P} = \frac{X}{N}. It is an estimate of the population proportion.

A retailer surveys 100 people and finds 35 own laptops.
The sample proportion is \hat{P} = \frac{35}{100} = 0.35.
The population proportion, P, is the proportion of all people in the city who own laptops.

If several samples are drawn, the values of \hat{P} are likely to vary.
\hat{P} is a random variable and has a probability distribution.
The probability distribution of \hat{P} is called the sampling distribution of \hat{P}.

Toss a fair coin five times (sample size n = 5).
The proportion of times the coin lands on heads is the sample proportion \hat{P}.
The probability of heads is 0.5, so the population proportion is P = 0.5.
There are 2^5 = 32 possible samples.
The table displays all possible samples of size five and their sample proportion \hat{P}.
The mean of all values of \hat{P} is \mu_{\hat{P}} = 0.5.
The standard deviation of all values of \hat{P} is \sigma_{\hat{P}} = 0.2236.

The mean of the sampling distribution, \mu{\hat{P}}, equals the population proportion P. \mu{\hat{P}} = P
The standard deviation of the sampling distribution, \sigma{\hat{P}}, is given by: \sigma{\hat{P}} = \sqrt{\frac{P(1-P)}{n}}

The proportion of winning tickets is P = 0.25.
n = 70 people purchase soft drinks.
The mean of \hat{P} is \mu_{\hat{P}} = P = 0.25.
The standard deviation of \hat{P} is \sigma_{\hat{P}} = \sqrt{\frac{0.25(1-0.25)}{70}} = 0.0518.

The probability histogram for the sampling distribution of \hat{P} for the proportion of heads in five tosses of a fair coin is presented.
The distribution is reasonably well approximated by a normal curve.
As the number of tosses increases, the sampling distribution of \hat{P} is more closely approximated by a normal curve.
When P = 0.5, the sampling distribution of \hat{P} is somewhat close to normal even for a small sample size like n = 5.
When P is close to 0 or 1, a larger sample size is needed before the distribution of \hat{P} is close to normal.
A common rule of thumb is that the distribution may be approximated with a normal curve whenever n \times P \geq 10 and n \times (1-P) \geq 10.

Let \hat{P} be the sample proportion for a sample of size n from a population with population proportion P.
If n \times P \geq 10 and n \times (1-P) \geq 10, then the distribution of \hat{P} is approximately normal with:
- Mean: \mu_{\hat{P}} = P
- Standard Deviation: \sigma_{\hat{P}} = \sqrt{\frac{P(1-P)}{n}}

A sample of size 20 is drawn from a population with population proportion P = 0.7. Is it appropriate to use the normal distribution to find probabilities for \hat{P}?
- n \times P = 20 \times 0.7 = 14
- n \times (1-P) = 20 \times 0.3 = 6
- Since n \times (1-P) is not at least 10, we cannot be certain that the distribution of \hat{P} is approximately normal.
A sample of size 55 is drawn from a population with population proportion P = 0.8. Is it appropriate to use the normal distribution to find probabilities for \hat{P}?
- n \times P = 55 \times 0.8 = 44
- n \times (1-P) = 55 \times 0.2 = 11
- These are both at least 10, so the distribution of \hat{P} is approximately normal.

According to a Harris Poll, chocolate is the favorite ice cream flavor for 27% of Americans.
If a sample of 100 Americans is taken, what is the probability that the sample proportion of those who prefer chocolate is greater than 0.3?
Here we are asked to find a probability involving a sample proportion. We check to make sure that the central limit theorem for proportions applies.

Central Limit Theorem states that if \hat{P} is a sample proportion for a sample of size n from a population with proportion P, then if n \times P and n \times (1-P) are both at least 10, then the distribution of \hat{P} is approximately normal.
The mean of this distribution is \mu{\hat{P}} which equals P, the population proportion, and the standard deviation is \sigma{\hat{P}}, which equals \sqrt{\frac{P(1-P)}{n}}.
In this example, n = 100, and the population proportion is P = 0.27. Since n \times P = 27 and n \times (1-P) = 73, and both of these are at least 10, we know that the normal curve can be used to find the probability that the sample proportion is greater than 0.3.

A simple random sample of size 80 is drawn from a population with population proportion P = 0.24. We wish to find the probability that \hat{P} is between 0.20 and 0.23.
- n = 80, P = 0.24
- n \times P = 19.2
- n \times (1-P) = 60.8
- Since both of these are at least 10, we may use the normal distribution to find the probability.
A simple random sample of size 145 is drawn from a population with population proportion P = 0.05. We wish to find the probability that \hat{P} is between 0.03 and 0.08.
- n = 145, P = 0.05
- n \times P = 145 \times 0.05 = 7.25
- Note that this value is less than 10, which means that the assumptions are not satisfied. So we stop at this point.

73% of teenagers own samrtphones. A sample of 150 teenagers is drawn. Would it be unusual if less than 68% of the the sample teenagers own smartphones?
- To determine wheter it would be unusual, we will find the area and compare to standard cutoff value of 0.05.
- n = 150, P = 0.73
- n \times P = 109.5
- n \times (1-P) = 40.5
- Since both quantitites are at least 10, the assumptions are satisfied and we may proceed.
- \mu_{\hat{P}} = P = 0.73
- \sigma_{\hat{P}} = \sqrt{\frac{0.73 \times (1-0.73)}{150}} = 0.036249
To find the probability that less than 68% of the sample teenagers own smartphones, we find the area under the normal curve to the left of 0.68.
Find the area to be approximately 0.0839. Since this is greater than 0.05, we conclude that it is not unusual that less than 68% of the sample teenagers own smartphones.