Central Limit Theorem for Proportions - Comprehensive Notes

Central Limit Theorem for Proportions

Population Proportion

  • The proportion of a population with a certain characteristic is the population proportion, denoted by P. Central to statistical inference and hypothesis testing.

Sample Proportion

  • In a simple random sample of N individuals, let X be the number in the sample with the characteristic.
  • The sample proportion, denoted by \hat{P}, is calculated as \hat{P} = \frac{X}{N}. It is an estimate of the population proportion.

Example

  • A retailer surveys 100 people and finds 35 own laptops.
  • The sample proportion is \hat{P} = \frac{35}{100} = 0.35.
  • The population proportion, P, is the proportion of all people in the city who own laptops.

Sampling Distribution of \hat{P}

  • If several samples are drawn, the values of \hat{P} are likely to vary.
  • \hat{P} is a random variable and has a probability distribution.
  • The probability distribution of \hat{P} is called the sampling distribution of \hat{P}.

Example: Tossing a Fair Coin

  • Toss a fair coin five times (sample size n = 5).
  • The proportion of times the coin lands on heads is the sample proportion \hat{P}.
  • The probability of heads is 0.5, so the population proportion is P = 0.5.
  • There are 2^5 = 32 possible samples.
  • The table displays all possible samples of size five and their sample proportion \hat{P}.
  • The mean of all values of \hat{P} is \mu_{\hat{P}} = 0.5.
  • The standard deviation of all values of \hat{P} is \sigma_{\hat{P}} = 0.2236.

Mean and Standard Deviation of the Sampling Distribution

  • The mean of the sampling distribution, \mu{\hat{P}}, equals the population proportion P. \mu{\hat{P}} = P
  • The standard deviation of the sampling distribution, \sigma{\hat{P}}, is given by: \sigma{\hat{P}} = \sqrt{\frac{P(1-P)}{n}}

Example: Soft Drink Cups

  • The proportion of winning tickets is P = 0.25.
  • n = 70 people purchase soft drinks.
  • The mean of \hat{P} is \mu_{\hat{P}} = P = 0.25.
  • The standard deviation of \hat{P} is \sigma_{\hat{P}} = \sqrt{\frac{0.25(1-0.25)}{70}} = 0.0518.

Probability Histogram

  • The probability histogram for the sampling distribution of \hat{P} for the proportion of heads in five tosses of a fair coin is presented.
  • The distribution is reasonably well approximated by a normal curve.
  • As the number of tosses increases, the sampling distribution of \hat{P} is more closely approximated by a normal curve.
  • When P = 0.5, the sampling distribution of \hat{P} is somewhat close to normal even for a small sample size like n = 5.
  • When P is close to 0 or 1, a larger sample size is needed before the distribution of \hat{P} is close to normal.
  • A common rule of thumb is that the distribution may be approximated with a normal curve whenever n \times P \geq 10 and n \times (1-P) \geq 10.

Central Limit Theorem for Proportions

  • Let \hat{P} be the sample proportion for a sample of size n from a population with population proportion P.
  • If n \times P \geq 10 and n \times (1-P) \geq 10, then the distribution of \hat{P} is approximately normal with:
    • Mean: \mu_{\hat{P}} = P
    • Standard Deviation: \sigma_{\hat{P}} = \sqrt{\frac{P(1-P)}{n}}

Examples

  • A sample of size 20 is drawn from a population with population proportion P = 0.7. Is it appropriate to use the normal distribution to find probabilities for \hat{P}?
    • n \times P = 20 \times 0.7 = 14
    • n \times (1-P) = 20 \times 0.3 = 6
    • Since n \times (1-P) is not at least 10, we cannot be certain that the distribution of \hat{P} is approximately normal.
  • A sample of size 55 is drawn from a population with population proportion P = 0.8. Is it appropriate to use the normal distribution to find probabilities for \hat{P}?
    • n \times P = 55 \times 0.8 = 44
    • n \times (1-P) = 55 \times 0.2 = 11
    • These are both at least 10, so the distribution of \hat{P} is approximately normal.

Calculating Probabilities Using Excel

Example: Ice Cream Preference

  • According to a Harris Poll, chocolate is the favorite ice cream flavor for 27% of Americans.
  • If a sample of 100 Americans is taken, what is the probability that the sample proportion of those who prefer chocolate is greater than 0.3?
  • Here we are asked to find a probability involving a sample proportion. We check to make sure that the central limit theorem for proportions applies.

Verifying the applicability of CLT

  • Central Limit Theorem states that if \hat{P} is a sample proportion for a sample of size n from a population with proportion P, then if n \times P and n \times (1-P) are both at least 10, then the distribution of \hat{P} is approximately normal.
  • The mean of this distribution is \mu{\hat{P}} which equals P, the population proportion, and the standard deviation is \sigma{\hat{P}}, which equals \sqrt{\frac{P(1-P)}{n}}.
  • In this example, n = 100, and the population proportion is P = 0.27. Since n \times P = 27 and n \times (1-P) = 73, and both of these are at least 10, we know that the normal curve can be used to find the probability that the sample proportion is greater than 0.3.

Calculations

  • \mu_{\hat{P}} = P = 0.27
  • \sigma_{\hat{P}} = \sqrt{\frac{0.27 \times (1-0.27)}{100}} = 0.0444
  • In Excel we enter 1 - NORM.DIST(0.3, 0.27, 0.0444, TRUE).
  • The result is 0.2496.

Additional examples

  • A simple random sample of size 80 is drawn from a population with population proportion P = 0.24. We wish to find the probability that \hat{P} is between 0.20 and 0.23.

    • n = 80, P = 0.24
    • n \times P = 19.2
    • n \times (1-P) = 60.8
    • Since both of these are at least 10, we may use the normal distribution to find the probability.
  • A simple random sample of size 145 is drawn from a population with population proportion P = 0.05. We wish to find the probability that \hat{P} is between 0.03 and 0.08.

    • n = 145, P = 0.05
    • n \times P = 145 \times 0.05 = 7.25
    • Note that this value is less than 10, which means that the assumptions are not satisfied. So we stop at this point.

Example: Smartphone ownership

  • 73% of teenagers own samrtphones. A sample of 150 teenagers is drawn. Would it be unusual if less than 68% of the the sample teenagers own smartphones?
    • To determine wheter it would be unusual, we will find the area and compare to standard cutoff value of 0.05.
    • n = 150, P = 0.73
    • n \times P = 109.5
    • n \times (1-P) = 40.5
    • Since both quantitites are at least 10, the assumptions are satisfied and we may proceed.
    • \mu_{\hat{P}} = P = 0.73
    • \sigma_{\hat{P}} = \sqrt{\frac{0.73 \times (1-0.73)}{150}} = 0.036249
  • To find the probability that less than 68% of the sample teenagers own smartphones, we find the area under the normal curve to the left of 0.68.
  • Find the area to be approximately 0.0839. Since this is greater than 0.05, we conclude that it is not unusual that less than 68% of the sample teenagers own smartphones.