Two aspects to this concept:
First, there is the confidence interval, usually expressed in the form:
Second, there is the success rate for the method, called the confidence level, that is, the proportion of times repeated applications of this method would capture the true population parameter.
\
All of the above assume that certain conditions are met. For inference on population proportions, means, and slopes, we must check for independence in data collection methods and for selection of the appropriate sampling distribution.
\
The following are the two standard assumptions for our inference procedures and the "ideal" way they are met:
Individuals in a sample or an experiment an must be independent of each other, and this is obtained through random sampling or random selection.
Independence across samples is obtained by selecting two (or more) separate random samples.
Always examine how the data were collected to check if the assumption of independence is reasonable.
Sample size can also affect independence. Because sampling is usually done without replacement, if the sample is too large, lack of independence becomes a concern.
So, we typically require that the sample size n be no larger than 10% of the population (the 10% Rule).
\
\
If we pick a simple random sample of size 80 from a large population, which of the following values of the population proportion p would allow use of the normal model for the sampling distribution of p̂?
Solution: (B)
\
This sample proportion is just one of a whole universe of sample proportions, and from Unit 5 we remember the following:
The set of all sample proportions is approximately normally distributed.
The mean μp̂ of the set of sample proportions equals p, the population proportion.
The standard deviation σp̂ of the set of sample proportions is approximately equal to
\
since p is unknown? The reasonable procedure is to use the sample proportion p̂:
When the standard deviation is estimated in this way (using the sample), we use the term standard error:
➥ Example 6.2
Solution:
The parameter is p, which represents the proportion of the population of young adults who would say that whoever asks for the date should pay for the first date. We check that
We are given that the sample is an SRS, and 550 is clearly less than 10% of all young adults. Since p̂ = 0.42, the standard error of the set of sample proportions is
99% of the sample proportions should be within 2.576 standard deviations of the population proportion. Equivalently, we are 99% certain that the population proportion is within 2.576 standard deviations of any sample proportion.
Thus, the 99% confidence interval estimate for the population proportion is 0.42 ± 2.576(0.021) = 0.42 ± 0.054. We say that the margin of error is ±0.054. We are 99% confident that the true proportion of young adults who would say that whoever asks for the date should pay for the first date is between 0.366 and 0.474.
\
\
\
\
The null hypothesis H0 is stated in the form of an equality statement about the population proportion (for example, H0: p = 0.37).
There is an alternative hypothesis, stated in the form of a strict inequality (for example, Ha: p < 0.37 or Ha: p > 0.37 or Ha: p ≠ 0.37).
The strength of the sample statistic p̂ can be gauged through its associated P-value, which is the probability of obtaining a sample statistic as extreme (or more extreme) as the one obtained if the null hypothesis is assumed to be true. The smaller the P-value, the more significant the difference between the null hypothesis and the sample results.
\n There are two types of possible errors:
There is a different value of β for each possible correct value for the population parameter p. For each β, 1 − β is called the "power" of the test against the associated correct value.
That is, given a true alternative, the power is the probability of rejecting the false null hypothesis. Increasing the sample size and increasing the significance level are both ways of increasing the power. Also note that a true null that is further away from the hypothesized null is more likely to be detected, thus offering a more powerful test.
A simple illustration of the difference between a Type I and a Type II error is as follows.
\
It should be emphasized that with regard to calculations, questions like “What is the power of this test?” and “What is the probability of a Type II error in this test?” cannot be answered without reference to a specific alternative hypothesis.
\
It is important to understand that because the P-value is a conditional probability, calculated based on the assumption that the null hypothesis, H0: p = p0, is true, we use the claimed proportion p0 both in checking the np0 ≥ 10 and n(1 − p0) ≥ 10 conditions and in calculating the standard deviation
\
Solution:
Hypotheses: H0: p = 0.75 and Ha: p < 0.75.
Procedure: One-sample z-test for a population proportion.
Checks: np0 = (125)(0.75) = 93.75 and n(1 − p0) = (125)(0.25) = 31.25 are both ≥ 10, it is given that we have an SRS, and we must assume that 125 is less than 10% of the total union membership.
Mechanics: Calculator software (such as 1-PropZTest on the TI-84 or Z-1-PROP on the Casio Prizm) gives z = −1.394 and P = 0.0816.
Conclusion in context with linkage to the P-value: There are two possible answers:
a. With this large of a P-value, 0.0816 > 0.05, there is not sufficient evidence to reject H0; that is, there is not sufficient evidence at the 5% significance level that the true percentage of union members who support a strike is less than 75%.
b. With this small of a P-value, 0.0816 < 0.10, there is sufficient evidence to reject H0; that is, there is sufficient evidence at the 10% significance level that the true percentage of union members who support a strike is less than 75%.
\
\
\
\
From Unit 5, we have the following information about the sampling distribution of
The set of all differences of sample proportions is approximately normally distributed.
The mean of the set of differences of sample proportions equals p1 − p2, the difference of the population proportions.
The standard deviation
of the set of differences of sample proportions is approximately equal to:
Remember that we are using the normal approximation to the binomial, so
should all be at least 10. In making calculations and drawing conclusions from specific samples, it is important both that the samples be simple random samples and that they be taken independentlyof each other. Finally, the original populations should be large compared to the sample sizes, that is, check that \n
\
Solution:
Procedure: Two-sample z-interval for a difference between population proportions, p1 − p2.
Checks:
we are given independent SRSs; and the sample sizes are assumed to be less than 10% of the populations of city hospital nurses on the two shifts, respectively.
Mechanics: 2-PropZInt on the TI-84 or 2-Prop ZInterval on the Casio Prizm give (0.0391, 0.2009).
The observed difference is 0.84 − 0.72 = 0.12, and the critical z-scores are ±1.645. The confidence interval estimate is 0.12 ± 1.645(0.0492) = 0.12 ± 0.081.]
Conclusion in context: We are 90% confident that the true proportion of satisfied nurses on 7:00 a.m. to 3:00 p.m. shifts is between 0.039 and 0.201 higher than the true proportion for nurses on 11:00 p.m. to 7:00 a.m. shifts.
\
\
The null hypothesis for a difference between two proportions is
and so the normality condition becomes that
\n
should all be at least 10, where p̂ is the combined (or pooled) proportion,
The other important conditions to be checked are both that the samples be random samples, ideally simple random samples, and that they be taken independently of each other. The original populations should also be large compared to the sample sizes, that is, check that
Two points need to be stressed:
\
For many problems, the null hypothesis states that the population proportions are equal or, equivalently, that their difference is 0:
The alternative hypothesis is then:
where the first two possibilities lead to one-sided tests and the third possibility leads to a two-sided test.
\
Since the null hypothesis is that p1 = p2, we call this common value pc and use this pooled value in calculating σd:
In practice, if
we use
as an estimate of pc in calculating σd.
\
\
Solution:
Hypotheses: H0: p1 − p2 = 0 or H0: p1 = p2 and Ha: p1 – p2 > 0 or Ha: p1 > p2.
Procedure: Two-sample z-test for a difference of two population proportions.
Checks:
are all at least 10; the samples are random and independent by design; and it is reasonable to assume the sample sizes are less than 10% of the populations.
Mechanics: Calculator software (such as 2-PropZTest) gives z = 11.0 and P = 0.000.
Conclusion in context with linkage to the P-value: With this small of a P-value, 0.000 < 0.05, there is sufficient evidence to reject H0; that is, there is convincing evidence that that the true proportion of all First Nations children in Canada in child welfare care is greater than the true proportion of all non-Aboriginal children in Canada in child welfare care.
\