Confidence Intervals Notes

Confidence Intervals

Introduction
  • This chapter covers constructing and interpreting confidence interval estimates for the population mean and proportion.
  • It also includes determining the necessary sample size for these estimates.
Point and Interval Estimates
  • Point Estimate: A single number used to estimate a population parameter.
  • Confidence Interval: Provides additional information about the variability of the estimate.
  • We can estimate population parameters such as μ\mu (population mean) or π\pi (population proportion) using sample statistics such as x\overline{x} (sample mean) or p (sample proportion).
Table 1: Point Estimates
Population ParameterSample StatisticPoint Estimate
μ\mux\overline{x}
π\pip
Understanding Confidence Intervals
  • Confidence intervals address the uncertainty associated with point estimates.
  • Interval Estimate: Gives a range of values providing more information than a point estimate.
  • Such interval estimates are called confidence intervals.
Key Aspects of Confidence Intervals
  • An interval gives a range of values.
  • Takes into consideration variation in sample statistics from sample to sample.
  • Based on observations from one sample.
  • Provides information about closeness to unknown population parameters.
  • Expressed in terms of a level of confidence (e.g., 95% or 99%), but can never be 100% confident.
Confidence Interval Example: Cereal Fill
  • Population has μ=368\mu = 368 and σ=15\sigma = 15.
  • Sample size is n=25n = 25.
  • From Chapter 7: μ±Z×σ<em>x\mu \pm Z \times \sigma<em>{\overline{x}}, where σ</em>x=σn\sigma</em>{\overline{x}} = \frac{\sigma}{\sqrt{n}}
    • 368±1.96×1525=(362.12,373.88)368 \pm 1.96 \times \frac{15}{\sqrt{25}} = (362.12, 373.88)
      • 95% of intervals formed this way will contain μ\mu.
  • When μ\mu is unknown, use x\overline{x} to estimate μ\mu.
    • If x=362.3\overline{x} = 362.3, the interval is 362.3±1.96×1525=(356.42,368.18)362.3 \pm 1.96 \times \frac{15}{\sqrt{25}} = (356.42, 368.18)
      • Since 356.42μ368.18356.42 \le \mu \le 368.18, the interval correctly estimates μ\mu.
Practical Considerations
  • In practice, only one sample of size n is taken.
  • In practice, μ\mu is unknown, so it's not known if the interval contains μ\mu.
  • 95% confidence is based on using Z=1.96Z = 1.96.
  • 95% of intervals formed this way may contain μ\mu.
  • Based on the selected sample, one can be 95% confident the interval may contain μ\mu (a 95% confidence interval).
General Formula for Confidence Intervals
  • The general formula for all confidence intervals is:
    • Point Estimate ± (Critical Value)(Standard Error)
      • Point Estimate: The sample statistic estimating the population parameter.
      • Critical Value: A table value based on the sampling distribution and desired confidence level.
      • Standard Error: The standard deviation of the point estimate.
Confidence Level, (1α)(1 - \alpha)
  • If the confidence level is 95%, (1α)=0.95(1 - \alpha) = 0.95, so α=0.05\alpha = 0.05.
  • Relative frequency interpretation:
    • 95% of all confidence intervals constructed will contain the true parameter.
  • A specific interval either contains or does not contain the true parameter.
    • There is no probability involved for a specific interval.
Confidence Interval for μ\mu (σ\sigma Known)
Assumptions:
  • Population standard deviation σ\sigma is known.
  • Population is normally distributed.
  • If the population is not normal, use a large sample size (n > 30).
Confidence interval estimate:
  • x±Zα/2σn\overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}
    • x\overline{x} is the point estimate.
    • Zα/2Z_{\alpha/2} is the normal distribution critical value for a probability of α/2\alpha/2 in each tail.
    • σn\frac{\sigma}{\sqrt{n}} is the standard error.
Common Levels of Confidence
Confidence LevelConfidence Coefficient 1 − αZα/2Z_{\alpha/2} value
80.0%0.8001.280
90.0%0.9001.645
95.0%0.9501.960
98.0%0.9802.330
99.0%0.9902.580
99.8%0.9983.080
99.9%0.9993.270
Example
  • A sample of 11 circuits from a normal population has a mean resistance of 2.22 ohms.
  • The population standard deviation is 0.35 ohms.
  • Determine a 95% confidence interval for the true mean resistance.
    • x±Zα/2σn=2.22±(1.96)0.3511=2.22±0.2068\overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = 2.22 \pm (1.96) \frac{0.35}{\sqrt{11}} = 2.22 \pm 0.2068
Interpretation
  • We are 95% confident that the true mean resistance is between 2.0132 and 2.4268 ohms.
  • Although the true mean may or may not be in this interval, 95% of intervals formed this way may contain the true mean.
Do You Ever Truly Know σ\sigma?
  • Probably not!
  • In real-world business situations, σ\sigma is usually unknown.
  • If σ\sigma is known, then μ\mu is also known (since calculating σ\sigma requires knowing μ\mu).
  • If μ\mu is known, there's no need to estimate it.
Confidence Interval for μ\mu (σ\sigma Unknown)
  • If the population standard deviation σ\sigma is unknown, substitute the sample standard deviation, S.
  • This introduces extra uncertainty since S varies from sample to sample.
  • Use the t-distribution instead of the normal distribution.
Assumptions:
  • Population standard deviation is unknown.
  • Population is normally distributed.
  • If the population is not normal, use a large sample (n > 30).
Use Student’s t Distribution
Confidence Interval Estimate:
  • x±tα/2Sn\overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}}
    • Where tα/2t_{\alpha/2} is the critical value of the t-distribution with n1n - 1 degrees of freedom and an area of α/2\alpha/2 in each tail.
Student’s t Distribution
  • The t-distribution is a family of distributions.
  • The tα/2t_{\alpha/2} value depends on degrees of freedom (d.f.).
  • Degrees of freedom represent the number of observations free to vary after the sample mean has been calculated.
    • d.f.=n1d.f. = n - 1
Degrees of Freedom (df)
  • Idea: Number of observations that are free to vary after sample mean has been calculated.
  • Example: Suppose the mean of 3 numbers is 8.0. Let X<em>1=7X<em>1 = 7 and X</em>2=8X</em>2 = 8. Then X<em>3X<em>3 must be 9 (i.e., X</em>3X</em>3 is not free to vary).
  • Here, n=3n = 3, so degrees of freedom =n1=31=2= n – 1 = 3 – 1 = 2. Two values can be any numbers, but the third is not free to vary for a given mean.
Example of t distribution confidence interval
  • A random sample of n=25n = 25 has x=50\overline{x} = 50 and S=8S = 8.
  • Form a 95% confidence interval for μ\mu.
    • d.f.=n1=24d.f. = n – 1 = 24, so t<em>α/2=t</em>0.025=2.064t<em>{\alpha/2} = t</em>{0.025} = 2.064
    • The confidence interval is: x±tα/2Sn=50±(2.064)825=50±3.302\overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}} = 50 \pm (2.064) \frac{8}{\sqrt{25}} = 50 \pm 3.302
    • The confidence interval is 46.698μ53.30246.698 \le \mu \le 53.302
  • Interpreting this interval requires the approximation that the population you are sampling from is approximately a normal distribution (especially since n is only 25). This condition can be checked by creating a:
    • Normal probability plot or
    • Boxplot
Confidence Intervals for the Population Proportion, π\pi
  • An interval estimate for the population proportion (π\pi) can be calculated by adding an allowance for uncertainty to the sample proportion (p).
  • Recall that the distribution of the sample proportion is approximately normal if the sample size is large, and we must have np > 5 and n(1-p) > 5 and the standard error of the proportion is:
    • σp=p(1p)n\sigma_{\overline{p}} = \sqrt{\frac{p(1 - p)}{n}}
Confidence Interval Endpoints
  • Upper and lower confidence limits for the population proportion are calculated with the formula:
    • p±Zα/2p(1p)np \pm Z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}
      • Where:
        • Zα/2Z_{\alpha/2} is the standard normal value for the level of confidence desired
        • p is the sample proportion
        • n is the sample size
          *Note: must have np > 5 and n(1-p) > 5
Example
  • A random sample of 100 people shows that 25 are left-handed.
  • Form a 95% confidence interval for the true proportion of left-handers.
    • p±Zα/2p(1p)n=25100±1.96(.25)(.75)100=p \pm Z_{\alpha/2}\sqrt{\frac{p(1 - p)}{n}} = \frac{25}{100} \pm 1.96\sqrt{\frac{(.25)(.75)}{100}} =
    • =25100±1.96(0.0433)= \frac{25}{100} \pm 1.96(0.0433)
  • So: We are 95% confident that X±0.0433X \pm 0.0433 contains the population proportion.
    • 0.1651p0.33490.1651 \le p \le 0.3349
Interpretation
  • We are 95% confident that the true percentage of left-handers in the population is between 16.51% and 33.49%.
  • Although the interval from 0.1651 to 0.3349 may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.
Determining Sample Size
Sampling Error
  • The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1α1 - \alpha).
  • The margin of error is also called sampling error, it is:
    • The amount of imprecision in the estimate of the population parameter
    • The amount added and subtracted to the point estimate to form the confidence interval.
  • For the Mean
    • e=Zα/2σne = Z_{\alpha/2} \sqrt{\frac{\sigma}{n}}
    • Now solve for n
    • n=Zα/22σ2e2n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2}
  • To determine the required sample size for the mean, you must know:
    • The desired level of confidence (1α1 - \alpha), which determines the critical value, Zα/2Z_{\alpha/2}
    • The acceptable sampling error, e
    • The standard deviation, σ\sigma
Required Sample Size Example
  • If σ=45\sigma = 45, what sample size is needed to estimate the mean within ±5± 5 with 90% confidence?
    • n=Zα/22σ2e2=(1.6452)(452)52=219.19n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2} = \frac{(1.645^2)(45^2)}{5^2} = 219.19
    • so the require sample size is 220 (always round up).
If σ\sigma is unknown
  • If unknown, σ\sigma can be estimated when using the required sample size formula:
    • Use a value for σ\sigma that is expected to be at least as large as the true σ\sigma.
    • Select a pilot sample and estimate σ\sigma with the sample standard deviation, S
Determining Sample Size For the Population
  • e=Z(π(1π)n)e = Z(\sqrt{\frac{\pi(1−\pi)}{n}})
  • Solve for n
    • n=Zα/22(π(1π)e2n = \frac{Z^2_{\alpha/2}(\pi(1−\pi)}{e^2}
  • To determine the required sample size for the proportion, you must know:
    • The desired level of confidence (1α1 - \alpha, which determines the critical value, Zα/2Z_{\alpha/2}
    • The acceptable sampling error, e
    • The true proportion of events of interest, π\pi
      • π\pi can be estimated with a pilot sample if necessary (or conservatively use 0.5 as an estimate of π\pi)
Required Sample Size Example
  • How large a sample would be necessary to estimate the true proportion defective in a large population within ±3± 3%, with 95% confidence?
  • (Assume a pilot sample yields p = 0.12)
  • Solution:
    • For 95% confidence, use Zα/2=1.96Z_{\alpha/2} = 1.96
    • e=0.03e = 0.03
    • p=0.12p = 0.12, so use this to estimate π\pi
    • n=Zα/22(π(1π)e2=(1.96)2(.12)(.88)(0.03)2=450.74n = \frac{Z^2_{\alpha/2}(\pi(1 − \pi)}{e^2} = \frac{(1.96)^2(.12)(.88)}{(0.03)^2} = 450.74
  • So: use n = 451
Ethical Issues
  • A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate.
  • The level of confidence should always be reported.
  • The sample size should be reported.
  • An interpretation of the confidence interval estimate should also be provided.
Final Note
  • The important thing to remember is that the margin of error, confidence interval, is generally a function three things, the degree of confidence required, the sample size and the percentage being estimated.
  • Thus, sampling error will decrease as:
    • The sample size (or number of interviews) gets bigger;
    • The percentage estimated approaches 0% or 100% or
    • The need to be certain about the result (e.g. the ‘‘confidence level’’) gets smaller.