This chapter covers constructing and interpreting confidence interval estimates for the population mean and proportion.
It also includes determining the necessary sample size for these estimates.
Point and Interval Estimates
Point Estimate: A single number used to estimate a population parameter.
Confidence Interval: Provides additional information about the variability of the estimate.
We can estimate population parameters such as μ (population mean) or π (population proportion) using sample statistics such as x (sample mean) or p (sample proportion).
Table 1: Point Estimates
Population Parameter
Sample Statistic
Point Estimate
μ
x
π
p
Understanding Confidence Intervals
Confidence intervals address the uncertainty associated with point estimates.
Interval Estimate: Gives a range of values providing more information than a point estimate.
Such interval estimates are called confidence intervals.
Key Aspects of Confidence Intervals
An interval gives a range of values.
Takes into consideration variation in sample statistics from sample to sample.
Based on observations from one sample.
Provides information about closeness to unknown population parameters.
Expressed in terms of a level of confidence (e.g., 95% or 99%), but can never be 100% confident.
Confidence Interval Example: Cereal Fill
Population has μ=368 and σ=15.
Sample size is n=25.
From Chapter 7: μ±Z×σ<em>x, where σ</em>x=nσ
368±1.96×2515=(362.12,373.88)
95% of intervals formed this way will contain μ.
When μ is unknown, use x to estimate μ.
If x=362.3, the interval is 362.3±1.96×2515=(356.42,368.18)
Since 356.42≤μ≤368.18, the interval correctly estimates μ.
Practical Considerations
In practice, only one sample of size n is taken.
In practice, μ is unknown, so it's not known if the interval contains μ.
95% confidence is based on using Z=1.96.
95% of intervals formed this way may contain μ.
Based on the selected sample, one can be 95% confident the interval may contain μ (a 95% confidence interval).
General Formula for Confidence Intervals
The general formula for all confidence intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Point Estimate: The sample statistic estimating the population parameter.
Critical Value: A table value based on the sampling distribution and desired confidence level.
Standard Error: The standard deviation of the point estimate.
Confidence Level, (1−α)
If the confidence level is 95%, (1−α)=0.95, so α=0.05.
Relative frequency interpretation:
95% of all confidence intervals constructed will contain the true parameter.
A specific interval either contains or does not contain the true parameter.
There is no probability involved for a specific interval.
Confidence Interval for μ (σ Known)
Assumptions:
Population standard deviation σ is known.
Population is normally distributed.
If the population is not normal, use a large sample size (n > 30).
Confidence interval estimate:
x±Zα/2nσ
x is the point estimate.
Zα/2 is the normal distribution critical value for a probability of α/2 in each tail.
nσ is the standard error.
Common Levels of Confidence
Confidence Level
Confidence Coefficient 1 − α
Zα/2 value
80.0%
0.800
1.280
90.0%
0.900
1.645
95.0%
0.950
1.960
98.0%
0.980
2.330
99.0%
0.990
2.580
99.8%
0.998
3.080
99.9%
0.999
3.270
Example
A sample of 11 circuits from a normal population has a mean resistance of 2.22 ohms.
The population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance.
x±Zα/2nσ=2.22±(1.96)110.35=2.22±0.2068
Interpretation
We are 95% confident that the true mean resistance is between 2.0132 and 2.4268 ohms.
Although the true mean may or may not be in this interval, 95% of intervals formed this way may contain the true mean.
Do You Ever Truly Know σ?
Probably not!
In real-world business situations, σ is usually unknown.
If σ is known, then μ is also known (since calculating σ requires knowing μ).
If μ is known, there's no need to estimate it.
Confidence Interval for μ (σ Unknown)
If the population standard deviation σ is unknown, substitute the sample standard deviation, S.
This introduces extra uncertainty since S varies from sample to sample.
Use the t-distribution instead of the normal distribution.
Assumptions:
Population standard deviation is unknown.
Population is normally distributed.
If the population is not normal, use a large sample (n > 30).
Use Student’s t Distribution
Confidence Interval Estimate:
x±tα/2nS
Where tα/2 is the critical value of the t-distribution with n−1 degrees of freedom and an area of α/2 in each tail.
Student’s t Distribution
The t-distribution is a family of distributions.
The tα/2 value depends on degrees of freedom (d.f.).
Degrees of freedom represent the number of observations free to vary after the sample mean has been calculated.
d.f.=n−1
Degrees of Freedom (df)
Idea: Number of observations that are free to vary after sample mean has been calculated.
Example: Suppose the mean of 3 numbers is 8.0. Let X<em>1=7 and X</em>2=8. Then X<em>3 must be 9 (i.e., X</em>3 is not free to vary).
Here, n=3, so degrees of freedom =n–1=3–1=2. Two values can be any numbers, but the third is not free to vary for a given mean.
Example of t distribution confidence interval
A random sample of n=25 has x=50 and S=8.
Form a 95% confidence interval for μ.
d.f.=n–1=24, so t<em>α/2=t</em>0.025=2.064
The confidence interval is: x±tα/2nS=50±(2.064)258=50±3.302
The confidence interval is 46.698≤μ≤53.302
Interpreting this interval requires the approximation that the population you are sampling from is approximately a normal distribution (especially since n is only 25). This condition can be checked by creating a:
Normal probability plot or
Boxplot
Confidence Intervals for the Population Proportion, π
An interval estimate for the population proportion (π) can be calculated by adding an allowance for uncertainty to the sample proportion (p).
Recall that the distribution of the sample proportion is approximately normal if the sample size is large, and we must have np > 5 and n(1-p) > 5 and the standard error of the proportion is:
σp=np(1−p)
Confidence Interval Endpoints
Upper and lower confidence limits for the population proportion are calculated with the formula:
p±Zα/2np(1−p)
Where:
Zα/2 is the standard normal value for the level of confidence desired
p is the sample proportion
n is the sample size
*Note: must have np > 5 and n(1-p) > 5
Example
A random sample of 100 people shows that 25 are left-handed.
Form a 95% confidence interval for the true proportion of left-handers.
p±Zα/2np(1−p)=10025±1.96100(.25)(.75)=
=10025±1.96(0.0433)
So: We are 95% confident that X±0.0433 contains the population proportion.
0.1651≤p≤0.3349
Interpretation
We are 95% confident that the true percentage of left-handers in the population is between 16.51% and 33.49%.
Although the interval from 0.1651 to 0.3349 may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.
Determining Sample Size
Sampling Error
The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1−α).
The margin of error is also called sampling error, it is:
The amount of imprecision in the estimate of the population parameter
The amount added and subtracted to the point estimate to form the confidence interval.
For the Mean
e=Zα/2nσ
Now solve for n
n=e2Zα/22σ2
To determine the required sample size for the mean, you must know:
The desired level of confidence (1−α), which determines the critical value, Zα/2
The acceptable sampling error, e
The standard deviation, σ
Required Sample Size Example
If σ=45, what sample size is needed to estimate the mean within ±5 with 90% confidence?
n=e2Zα/22σ2=52(1.6452)(452)=219.19
so the require sample size is 220 (always round up).
If σ is unknown
If unknown, σ can be estimated when using the required sample size formula:
Use a value for σ that is expected to be at least as large as the true σ.
Select a pilot sample and estimate σ with the sample standard deviation, S
Determining Sample Size For the Population
e=Z(nπ(1−π))
Solve for n
n=e2Zα/22(π(1−π)
To determine the required sample size for the proportion, you must know:
The desired level of confidence (1−α, which determines the critical value, Zα/2
The acceptable sampling error, e
The true proportion of events of interest, π
π can be estimated with a pilot sample if necessary (or conservatively use 0.5 as an estimate of π)
Required Sample Size Example
How large a sample would be necessary to estimate the true proportion defective in a large population within ±3, with 95% confidence?
A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate.
The level of confidence should always be reported.
The sample size should be reported.
An interpretation of the confidence interval estimate should also be provided.
Final Note
The important thing to remember is that the margin of error, confidence interval, is generally a function three things, the degree of confidence required, the sample size and the percentage being estimated.
Thus, sampling error will decrease as:
The sample size (or number of interviews) gets bigger;
The percentage estimated approaches 0% or 100% or
The need to be certain about the result (e.g. the ‘‘confidence level’’) gets smaller.