Confidence Intervals Notes
Confidence Intervals
Introduction
- This chapter covers constructing and interpreting confidence interval estimates for the population mean and proportion.
- It also includes determining the necessary sample size for these estimates.
Point and Interval Estimates
- Point Estimate: A single number used to estimate a population parameter.
- Confidence Interval: Provides additional information about the variability of the estimate.
- We can estimate population parameters such as \mu (population mean) or \pi (population proportion) using sample statistics such as \overline{x} (sample mean) or p (sample proportion).
Table 1: Point Estimates
| Population Parameter | Sample Statistic | Point Estimate |
|---|
| \mu | \overline{x} | |
| \pi | p | |
Understanding Confidence Intervals
- Confidence intervals address the uncertainty associated with point estimates.
- Interval Estimate: Gives a range of values providing more information than a point estimate.
- Such interval estimates are called confidence intervals.
Key Aspects of Confidence Intervals
- An interval gives a range of values.
- Takes into consideration variation in sample statistics from sample to sample.
- Based on observations from one sample.
- Provides information about closeness to unknown population parameters.
- Expressed in terms of a level of confidence (e.g., 95% or 99%), but can never be 100% confident.
Confidence Interval Example: Cereal Fill
- Population has \mu = 368 and \sigma = 15.
- Sample size is n = 25.
- From Chapter 7: \mu \pm Z \times \sigma{\overline{x}}, where \sigma{\overline{x}} = \frac{\sigma}{\sqrt{n}}
- 368 \pm 1.96 \times \frac{15}{\sqrt{25}} = (362.12, 373.88)
- 95% of intervals formed this way will contain \mu.
- When \mu is unknown, use \overline{x} to estimate \mu.
- If \overline{x} = 362.3, the interval is 362.3 \pm 1.96 \times \frac{15}{\sqrt{25}} = (356.42, 368.18)
- Since 356.42 \le \mu \le 368.18, the interval correctly estimates \mu.
Practical Considerations
- In practice, only one sample of size n is taken.
- In practice, \mu is unknown, so it's not known if the interval contains \mu.
- 95% confidence is based on using Z = 1.96.
- 95% of intervals formed this way may contain \mu.
- Based on the selected sample, one can be 95% confident the interval may contain \mu (a 95% confidence interval).
- The general formula for all confidence intervals is:
- Point Estimate ± (Critical Value)(Standard Error)
- Point Estimate: The sample statistic estimating the population parameter.
- Critical Value: A table value based on the sampling distribution and desired confidence level.
- Standard Error: The standard deviation of the point estimate.
Confidence Level, (1 - \alpha)
- If the confidence level is 95%, (1 - \alpha) = 0.95, so \alpha = 0.05.
- Relative frequency interpretation:
- 95% of all confidence intervals constructed will contain the true parameter.
- A specific interval either contains or does not contain the true parameter.
- There is no probability involved for a specific interval.
Confidence Interval for \mu (\sigma Known)
Assumptions:
- Population standard deviation \sigma is known.
- Population is normally distributed.
- If the population is not normal, use a large sample size (n > 30).
Confidence interval estimate:
- \overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}
- \overline{x} is the point estimate.
- Z_{\alpha/2} is the normal distribution critical value for a probability of \alpha/2 in each tail.
- \frac{\sigma}{\sqrt{n}} is the standard error.
Common Levels of Confidence
| Confidence Level | Confidence Coefficient 1 − α | Z_{\alpha/2} value |
|---|
| 80.0% | 0.800 | 1.280 |
| 90.0% | 0.900 | 1.645 |
| 95.0% | 0.950 | 1.960 |
| 98.0% | 0.980 | 2.330 |
| 99.0% | 0.990 | 2.580 |
| 99.8% | 0.998 | 3.080 |
| 99.9% | 0.999 | 3.270 |
Example
- A sample of 11 circuits from a normal population has a mean resistance of 2.22 ohms.
- The population standard deviation is 0.35 ohms.
- Determine a 95% confidence interval for the true mean resistance.
- \overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = 2.22 \pm (1.96) \frac{0.35}{\sqrt{11}} = 2.22 \pm 0.2068
Interpretation
- We are 95% confident that the true mean resistance is between 2.0132 and 2.4268 ohms.
- Although the true mean may or may not be in this interval, 95% of intervals formed this way may contain the true mean.
Do You Ever Truly Know \sigma?
- Probably not!
- In real-world business situations, \sigma is usually unknown.
- If \sigma is known, then \mu is also known (since calculating \sigma requires knowing \mu).
- If \mu is known, there's no need to estimate it.
Confidence Interval for \mu (\sigma Unknown)
- If the population standard deviation \sigma is unknown, substitute the sample standard deviation, S.
- This introduces extra uncertainty since S varies from sample to sample.
- Use the t-distribution instead of the normal distribution.
Assumptions:
- Population standard deviation is unknown.
- Population is normally distributed.
- If the population is not normal, use a large sample (n > 30).
Use Student’s t Distribution
Confidence Interval Estimate:
- \overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}}
- Where t_{\alpha/2} is the critical value of the t-distribution with n - 1 degrees of freedom and an area of \alpha/2 in each tail.
Student’s t Distribution
- The t-distribution is a family of distributions.
- The t_{\alpha/2} value depends on degrees of freedom (d.f.).
- Degrees of freedom represent the number of observations free to vary after the sample mean has been calculated.
Degrees of Freedom (df)
- Idea: Number of observations that are free to vary after sample mean has been calculated.
- Example: Suppose the mean of 3 numbers is 8.0. Let X1 = 7 and X2 = 8. Then X3 must be 9 (i.e., X3 is not free to vary).
- Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2. Two values can be any numbers, but the third is not free to vary for a given mean.
Example of t distribution confidence interval
- A random sample of n = 25 has \overline{x} = 50 and S = 8.
- Form a 95% confidence interval for \mu.
- d.f. = n – 1 = 24, so t{\alpha/2} = t{0.025} = 2.064
- The confidence interval is: \overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}} = 50 \pm (2.064) \frac{8}{\sqrt{25}} = 50 \pm 3.302
- The confidence interval is 46.698 \le \mu \le 53.302
- Interpreting this interval requires the approximation that the population you are sampling from is approximately a normal distribution (especially since n is only 25). This condition can be checked by creating a:
- Normal probability plot or
- Boxplot
Confidence Intervals for the Population Proportion, \pi
- An interval estimate for the population proportion (\pi) can be calculated by adding an allowance for uncertainty to the sample proportion (p).
- Recall that the distribution of the sample proportion is approximately normal if the sample size is large, and we must have np > 5 and n(1-p) > 5 and the standard error of the proportion is:
- \sigma_{\overline{p}} = \sqrt{\frac{p(1 - p)}{n}}
Confidence Interval Endpoints
- Upper and lower confidence limits for the population proportion are calculated with the formula:
- p \pm Z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}
- Where:
- Z_{\alpha/2} is the standard normal value for the level of confidence desired
- p is the sample proportion
- n is the sample size
*Note: must have np > 5 and n(1-p) > 5
Example
- A random sample of 100 people shows that 25 are left-handed.
- Form a 95% confidence interval for the true proportion of left-handers.
- p \pm Z_{\alpha/2}\sqrt{\frac{p(1 - p)}{n}} = \frac{25}{100} \pm 1.96\sqrt{\frac{(.25)(.75)}{100}} =
- = \frac{25}{100} \pm 1.96(0.0433)
- So: We are 95% confident that X \pm 0.0433 contains the population proportion.
Interpretation
- We are 95% confident that the true percentage of left-handers in the population is between 16.51% and 33.49%.
- Although the interval from 0.1651 to 0.3349 may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.
Determining Sample Size
Sampling Error
- The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1 - \alpha).
- The margin of error is also called sampling error, it is:
- The amount of imprecision in the estimate of the population parameter
- The amount added and subtracted to the point estimate to form the confidence interval.
- For the Mean
- e = Z_{\alpha/2} \sqrt{\frac{\sigma}{n}}
- Now solve for n
- n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2}
- To determine the required sample size for the mean, you must know:
- The desired level of confidence (1 - \alpha), which determines the critical value, Z_{\alpha/2}
- The acceptable sampling error, e
- The standard deviation, \sigma
Required Sample Size Example
- If \sigma = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence?
- n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2} = \frac{(1.645^2)(45^2)}{5^2} = 219.19
- so the require sample size is 220 (always round up).
If \sigma is unknown
- If unknown, \sigma can be estimated when using the required sample size formula:
- Use a value for \sigma that is expected to be at least as large as the true \sigma.
- Select a pilot sample and estimate \sigma with the sample standard deviation, S
Determining Sample Size For the Population
- e = Z(\sqrt{\frac{\pi(1−\pi)}{n}})
- Solve for n
- n = \frac{Z^2_{\alpha/2}(\pi(1−\pi)}{e^2}
- To determine the required sample size for the proportion, you must know:
- The desired level of confidence (1 - \alpha, which determines the critical value, Z_{\alpha/2}
- The acceptable sampling error, e
- The true proportion of events of interest, \pi
- \pi can be estimated with a pilot sample if necessary (or conservatively use 0.5 as an estimate of \pi)
Required Sample Size Example
- How large a sample would be necessary to estimate the true proportion defective in a large population within ± 3%, with 95% confidence?
- (Assume a pilot sample yields p = 0.12)
- Solution:
- For 95% confidence, use Z_{\alpha/2} = 1.96
- e = 0.03
- p = 0.12, so use this to estimate \pi
- n = \frac{Z^2_{\alpha/2}(\pi(1 − \pi)}{e^2} = \frac{(1.96)^2(.12)(.88)}{(0.03)^2} = 450.74
- So: use n = 451
Ethical Issues
- A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate.
- The level of confidence should always be reported.
- The sample size should be reported.
- An interpretation of the confidence interval estimate should also be provided.
Final Note
- The important thing to remember is that the margin of error, confidence interval, is generally a function three things, the degree of confidence required, the sample size and the percentage being estimated.
- Thus, sampling error will decrease as:
- The sample size (or number of interviews) gets bigger;
- The percentage estimated approaches 0% or 100% or
- The need to be certain about the result (e.g. the ‘‘confidence level’’) gets smaller.