Confidence Intervals Notes

Confidence Intervals

Introduction

  • This chapter covers constructing and interpreting confidence interval estimates for the population mean and proportion.
  • It also includes determining the necessary sample size for these estimates.

Point and Interval Estimates

  • Point Estimate: A single number used to estimate a population parameter.
  • Confidence Interval: Provides additional information about the variability of the estimate.
  • We can estimate population parameters such as \mu (population mean) or \pi (population proportion) using sample statistics such as \overline{x} (sample mean) or p (sample proportion).
Table 1: Point Estimates
Population ParameterSample StatisticPoint Estimate
\mu\overline{x}
\pip

Understanding Confidence Intervals

  • Confidence intervals address the uncertainty associated with point estimates.
  • Interval Estimate: Gives a range of values providing more information than a point estimate.
  • Such interval estimates are called confidence intervals.

Key Aspects of Confidence Intervals

  • An interval gives a range of values.
  • Takes into consideration variation in sample statistics from sample to sample.
  • Based on observations from one sample.
  • Provides information about closeness to unknown population parameters.
  • Expressed in terms of a level of confidence (e.g., 95% or 99%), but can never be 100% confident.

Confidence Interval Example: Cereal Fill

  • Population has \mu = 368 and \sigma = 15.
  • Sample size is n = 25.
  • From Chapter 7: \mu \pm Z \times \sigma{\overline{x}}, where \sigma{\overline{x}} = \frac{\sigma}{\sqrt{n}}
    • 368 \pm 1.96 \times \frac{15}{\sqrt{25}} = (362.12, 373.88)
      • 95% of intervals formed this way will contain \mu.
  • When \mu is unknown, use \overline{x} to estimate \mu.
    • If \overline{x} = 362.3, the interval is 362.3 \pm 1.96 \times \frac{15}{\sqrt{25}} = (356.42, 368.18)
      • Since 356.42 \le \mu \le 368.18, the interval correctly estimates \mu.

Practical Considerations

  • In practice, only one sample of size n is taken.
  • In practice, \mu is unknown, so it's not known if the interval contains \mu.
  • 95% confidence is based on using Z = 1.96.
  • 95% of intervals formed this way may contain \mu.
  • Based on the selected sample, one can be 95% confident the interval may contain \mu (a 95% confidence interval).

General Formula for Confidence Intervals

  • The general formula for all confidence intervals is:
    • Point Estimate ± (Critical Value)(Standard Error)
      • Point Estimate: The sample statistic estimating the population parameter.
      • Critical Value: A table value based on the sampling distribution and desired confidence level.
      • Standard Error: The standard deviation of the point estimate.

Confidence Level, (1 - \alpha)

  • If the confidence level is 95%, (1 - \alpha) = 0.95, so \alpha = 0.05.
  • Relative frequency interpretation:
    • 95% of all confidence intervals constructed will contain the true parameter.
  • A specific interval either contains or does not contain the true parameter.
    • There is no probability involved for a specific interval.

Confidence Interval for \mu (\sigma Known)

Assumptions:
  • Population standard deviation \sigma is known.
  • Population is normally distributed.
  • If the population is not normal, use a large sample size (n > 30).
Confidence interval estimate:
  • \overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}
    • \overline{x} is the point estimate.
    • Z_{\alpha/2} is the normal distribution critical value for a probability of \alpha/2 in each tail.
    • \frac{\sigma}{\sqrt{n}} is the standard error.

Common Levels of Confidence

Confidence LevelConfidence Coefficient 1 − αZ_{\alpha/2} value
80.0%0.8001.280
90.0%0.9001.645
95.0%0.9501.960
98.0%0.9802.330
99.0%0.9902.580
99.8%0.9983.080
99.9%0.9993.270

Example

  • A sample of 11 circuits from a normal population has a mean resistance of 2.22 ohms.
  • The population standard deviation is 0.35 ohms.
  • Determine a 95% confidence interval for the true mean resistance.
    • \overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = 2.22 \pm (1.96) \frac{0.35}{\sqrt{11}} = 2.22 \pm 0.2068

Interpretation

  • We are 95% confident that the true mean resistance is between 2.0132 and 2.4268 ohms.
  • Although the true mean may or may not be in this interval, 95% of intervals formed this way may contain the true mean.

Do You Ever Truly Know \sigma?

  • Probably not!
  • In real-world business situations, \sigma is usually unknown.
  • If \sigma is known, then \mu is also known (since calculating \sigma requires knowing \mu).
  • If \mu is known, there's no need to estimate it.

Confidence Interval for \mu (\sigma Unknown)

  • If the population standard deviation \sigma is unknown, substitute the sample standard deviation, S.
  • This introduces extra uncertainty since S varies from sample to sample.
  • Use the t-distribution instead of the normal distribution.
Assumptions:
  • Population standard deviation is unknown.
  • Population is normally distributed.
  • If the population is not normal, use a large sample (n > 30).

Use Student’s t Distribution

Confidence Interval Estimate:
  • \overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}}
    • Where t_{\alpha/2} is the critical value of the t-distribution with n - 1 degrees of freedom and an area of \alpha/2 in each tail.

Student’s t Distribution

  • The t-distribution is a family of distributions.
  • The t_{\alpha/2} value depends on degrees of freedom (d.f.).
  • Degrees of freedom represent the number of observations free to vary after the sample mean has been calculated.
    • d.f. = n - 1

Degrees of Freedom (df)

  • Idea: Number of observations that are free to vary after sample mean has been calculated.
  • Example: Suppose the mean of 3 numbers is 8.0. Let X1 = 7 and X2 = 8. Then X3 must be 9 (i.e., X3 is not free to vary).
  • Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2. Two values can be any numbers, but the third is not free to vary for a given mean.

Example of t distribution confidence interval

  • A random sample of n = 25 has \overline{x} = 50 and S = 8.
  • Form a 95% confidence interval for \mu.
    • d.f. = n – 1 = 24, so t{\alpha/2} = t{0.025} = 2.064
    • The confidence interval is: \overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}} = 50 \pm (2.064) \frac{8}{\sqrt{25}} = 50 \pm 3.302
    • The confidence interval is 46.698 \le \mu \le 53.302
  • Interpreting this interval requires the approximation that the population you are sampling from is approximately a normal distribution (especially since n is only 25). This condition can be checked by creating a:
    • Normal probability plot or
    • Boxplot

Confidence Intervals for the Population Proportion, \pi

  • An interval estimate for the population proportion (\pi) can be calculated by adding an allowance for uncertainty to the sample proportion (p).
  • Recall that the distribution of the sample proportion is approximately normal if the sample size is large, and we must have np > 5 and n(1-p) > 5 and the standard error of the proportion is:
    • \sigma_{\overline{p}} = \sqrt{\frac{p(1 - p)}{n}}

Confidence Interval Endpoints

  • Upper and lower confidence limits for the population proportion are calculated with the formula:
    • p \pm Z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}
      • Where:
        • Z_{\alpha/2} is the standard normal value for the level of confidence desired
        • p is the sample proportion
        • n is the sample size
          *Note: must have np > 5 and n(1-p) > 5

Example

  • A random sample of 100 people shows that 25 are left-handed.
  • Form a 95% confidence interval for the true proportion of left-handers.
    • p \pm Z_{\alpha/2}\sqrt{\frac{p(1 - p)}{n}} = \frac{25}{100} \pm 1.96\sqrt{\frac{(.25)(.75)}{100}} =
    • = \frac{25}{100} \pm 1.96(0.0433)
  • So: We are 95% confident that X \pm 0.0433 contains the population proportion.
    • 0.1651 \le p \le 0.3349

Interpretation

  • We are 95% confident that the true percentage of left-handers in the population is between 16.51% and 33.49%.
  • Although the interval from 0.1651 to 0.3349 may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.

Determining Sample Size

Sampling Error
  • The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1 - \alpha).
  • The margin of error is also called sampling error, it is:
    • The amount of imprecision in the estimate of the population parameter
    • The amount added and subtracted to the point estimate to form the confidence interval.
  • For the Mean
    • e = Z_{\alpha/2} \sqrt{\frac{\sigma}{n}}
    • Now solve for n
    • n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2}
  • To determine the required sample size for the mean, you must know:
    • The desired level of confidence (1 - \alpha), which determines the critical value, Z_{\alpha/2}
    • The acceptable sampling error, e
    • The standard deviation, \sigma

Required Sample Size Example

  • If \sigma = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence?
    • n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2} = \frac{(1.645^2)(45^2)}{5^2} = 219.19
    • so the require sample size is 220 (always round up).

If \sigma is unknown

  • If unknown, \sigma can be estimated when using the required sample size formula:
    • Use a value for \sigma that is expected to be at least as large as the true \sigma.
    • Select a pilot sample and estimate \sigma with the sample standard deviation, S

Determining Sample Size For the Population

  • e = Z(\sqrt{\frac{\pi(1−\pi)}{n}})
  • Solve for n
    • n = \frac{Z^2_{\alpha/2}(\pi(1−\pi)}{e^2}
  • To determine the required sample size for the proportion, you must know:
    • The desired level of confidence (1 - \alpha, which determines the critical value, Z_{\alpha/2}
    • The acceptable sampling error, e
    • The true proportion of events of interest, \pi
      • \pi can be estimated with a pilot sample if necessary (or conservatively use 0.5 as an estimate of \pi)

Required Sample Size Example

  • How large a sample would be necessary to estimate the true proportion defective in a large population within ± 3%, with 95% confidence?
  • (Assume a pilot sample yields p = 0.12)
  • Solution:
    • For 95% confidence, use Z_{\alpha/2} = 1.96
    • e = 0.03
    • p = 0.12, so use this to estimate \pi
    • n = \frac{Z^2_{\alpha/2}(\pi(1 − \pi)}{e^2} = \frac{(1.96)^2(.12)(.88)}{(0.03)^2} = 450.74
  • So: use n = 451

Ethical Issues

  • A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate.
  • The level of confidence should always be reported.
  • The sample size should be reported.
  • An interpretation of the confidence interval estimate should also be provided.

Final Note

  • The important thing to remember is that the margin of error, confidence interval, is generally a function three things, the degree of confidence required, the sample size and the percentage being estimated.
  • Thus, sampling error will decrease as:
    • The sample size (or number of interviews) gets bigger;
    • The percentage estimated approaches 0% or 100% or
    • The need to be certain about the result (e.g. the ‘‘confidence level’’) gets smaller.