Central Limit Theorem for Proportions - Comprehensive Notes
Central Limit Theorem for Proportions
Population Proportion
- The proportion of a population with a certain characteristic is the population proportion, denoted by P. Central to statistical inference and hypothesis testing.
Sample Proportion
- In a simple random sample of N individuals, let X be the number in the sample with the characteristic.
- The sample proportion, denoted by \hat{P}, is calculated as \hat{P} = \frac{X}{N}. It is an estimate of the population proportion.
Example
- A retailer surveys 100 people and finds 35 own laptops.
- The sample proportion is \hat{P} = \frac{35}{100} = 0.35.
- The population proportion, P, is the proportion of all people in the city who own laptops.
Sampling Distribution of \hat{P}
- If several samples are drawn, the values of \hat{P} are likely to vary.
- \hat{P} is a random variable and has a probability distribution.
- The probability distribution of \hat{P} is called the sampling distribution of \hat{P}.
Example: Tossing a Fair Coin
- Toss a fair coin five times (sample size n = 5).
- The proportion of times the coin lands on heads is the sample proportion \hat{P}.
- The probability of heads is 0.5, so the population proportion is P = 0.5.
- There are 2^5 = 32 possible samples.
- The table displays all possible samples of size five and their sample proportion \hat{P}.
- The mean of all values of \hat{P} is \mu_{\hat{P}} = 0.5.
- The standard deviation of all values of \hat{P} is \sigma_{\hat{P}} = 0.2236.
Mean and Standard Deviation of the Sampling Distribution
- The mean of the sampling distribution, \mu{\hat{P}}, equals the population proportion P. \mu{\hat{P}} = P
- The standard deviation of the sampling distribution, \sigma{\hat{P}}, is given by: \sigma{\hat{P}} = \sqrt{\frac{P(1-P)}{n}}
Example: Soft Drink Cups
- The proportion of winning tickets is P = 0.25.
- n = 70 people purchase soft drinks.
- The mean of \hat{P} is \mu_{\hat{P}} = P = 0.25.
- The standard deviation of \hat{P} is \sigma_{\hat{P}} = \sqrt{\frac{0.25(1-0.25)}{70}} = 0.0518.
Probability Histogram
- The probability histogram for the sampling distribution of \hat{P} for the proportion of heads in five tosses of a fair coin is presented.
- The distribution is reasonably well approximated by a normal curve.
- As the number of tosses increases, the sampling distribution of \hat{P} is more closely approximated by a normal curve.
- When P = 0.5, the sampling distribution of \hat{P} is somewhat close to normal even for a small sample size like n = 5.
- When P is close to 0 or 1, a larger sample size is needed before the distribution of \hat{P} is close to normal.
- A common rule of thumb is that the distribution may be approximated with a normal curve whenever n \times P \geq 10 and n \times (1-P) \geq 10.
Central Limit Theorem for Proportions
- Let \hat{P} be the sample proportion for a sample of size n from a population with population proportion P.
- If n \times P \geq 10 and n \times (1-P) \geq 10, then the distribution of \hat{P} is approximately normal with:
- Mean: \mu_{\hat{P}} = P
- Standard Deviation: \sigma_{\hat{P}} = \sqrt{\frac{P(1-P)}{n}}
Examples
- A sample of size 20 is drawn from a population with population proportion P = 0.7. Is it appropriate to use the normal distribution to find probabilities for \hat{P}?
- n \times P = 20 \times 0.7 = 14
- n \times (1-P) = 20 \times 0.3 = 6
- Since n \times (1-P) is not at least 10, we cannot be certain that the distribution of \hat{P} is approximately normal.
- A sample of size 55 is drawn from a population with population proportion P = 0.8. Is it appropriate to use the normal distribution to find probabilities for \hat{P}?
- n \times P = 55 \times 0.8 = 44
- n \times (1-P) = 55 \times 0.2 = 11
- These are both at least 10, so the distribution of \hat{P} is approximately normal.
Calculating Probabilities Using Excel
Example: Ice Cream Preference
- According to a Harris Poll, chocolate is the favorite ice cream flavor for 27% of Americans.
- If a sample of 100 Americans is taken, what is the probability that the sample proportion of those who prefer chocolate is greater than 0.3?
- Here we are asked to find a probability involving a sample proportion. We check to make sure that the central limit theorem for proportions applies.
Verifying the applicability of CLT
- Central Limit Theorem states that if \hat{P} is a sample proportion for a sample of size n from a population with proportion P, then if n \times P and n \times (1-P) are both at least 10, then the distribution of \hat{P} is approximately normal.
- The mean of this distribution is \mu{\hat{P}} which equals P, the population proportion, and the standard deviation is \sigma{\hat{P}}, which equals \sqrt{\frac{P(1-P)}{n}}.
- In this example, n = 100, and the population proportion is P = 0.27. Since n \times P = 27 and n \times (1-P) = 73, and both of these are at least 10, we know that the normal curve can be used to find the probability that the sample proportion is greater than 0.3.
Calculations
- \mu_{\hat{P}} = P = 0.27
- \sigma_{\hat{P}} = \sqrt{\frac{0.27 \times (1-0.27)}{100}} = 0.0444
- In Excel we enter
1 - NORM.DIST(0.3, 0.27, 0.0444, TRUE). - The result is 0.2496.
Additional examples
A simple random sample of size 80 is drawn from a population with population proportion P = 0.24. We wish to find the probability that \hat{P} is between 0.20 and 0.23.
- n = 80, P = 0.24
- n \times P = 19.2
- n \times (1-P) = 60.8
- Since both of these are at least 10, we may use the normal distribution to find the probability.
A simple random sample of size 145 is drawn from a population with population proportion P = 0.05. We wish to find the probability that \hat{P} is between 0.03 and 0.08.
- n = 145, P = 0.05
- n \times P = 145 \times 0.05 = 7.25
- Note that this value is less than 10, which means that the assumptions are not satisfied. So we stop at this point.
Example: Smartphone ownership
- 73% of teenagers own samrtphones. A sample of 150 teenagers is drawn. Would it be unusual if less than 68% of the the sample teenagers own smartphones?
- To determine wheter it would be unusual, we will find the area and compare to standard cutoff value of 0.05.
- n = 150, P = 0.73
- n \times P = 109.5
- n \times (1-P) = 40.5
- Since both quantitites are at least 10, the assumptions are satisfied and we may proceed.
- \mu_{\hat{P}} = P = 0.73
- \sigma_{\hat{P}} = \sqrt{\frac{0.73 \times (1-0.73)}{150}} = 0.036249
- To find the probability that less than 68% of the sample teenagers own smartphones, we find the area under the normal curve to the left of 0.68.
- Find the area to be approximately 0.0839. Since this is greater than 0.05, we conclude that it is not unusual that less than 68% of the sample teenagers own smartphones.