Confidence Intervals Notes

Confidence Intervals

Introduction

This chapter covers constructing and interpreting confidence interval estimates for the population mean and proportion.
It also includes determining the necessary sample size for these estimates.

Point and Interval Estimates

Point Estimate: A single number used to estimate a population parameter.
Confidence Interval: Provides additional information about the variability of the estimate.
We can estimate population parameters such as $\mu$ (population mean) or $\pi$ (population proportion) using sample statistics such as $\overline{x}$ (sample mean) or p (sample proportion).

Table 1: Point Estimates

Population Parameter	Sample Statistic	Point Estimate
$\mu$	$\overline{x}$
$\pi$	p

Understanding Confidence Intervals

Confidence intervals address the uncertainty associated with point estimates.
Interval Estimate: Gives a range of values providing more information than a point estimate.
Such interval estimates are called confidence intervals.

Key Aspects of Confidence Intervals

An interval gives a range of values.
Takes into consideration variation in sample statistics from sample to sample.
Based on observations from one sample.
Provides information about closeness to unknown population parameters.
Expressed in terms of a level of confidence (e.g., 95% or 99%), but can never be 100% confident.

Confidence Interval Example: Cereal Fill

Population has $\mu = 368$ and $\sigma = 15$ .
Sample size is $n = 25$ .
From Chapter 7: $\mu \pm Z \times \sigma{\overline{x}}$ , where $\sigma{\overline{x}} = \frac{\sigma}{\sqrt{n}}$
- $368 \pm 1.96 \times \frac{15}{\sqrt{25}} = (362.12, 373.88)$
 - 95% of intervals formed this way will contain $\mu$ .
When $\mu$ is unknown, use $\overline{x}$ to estimate $\mu$ .
- If $\overline{x} = 362.3$ , the interval is $362.3 \pm 1.96 \times \frac{15}{\sqrt{25}} = (356.42, 368.18)$
  - Since $356.42 \le \mu \le 368.18$ , the interval correctly estimates $\mu$ .

Practical Considerations

In practice, only one sample of size n is taken.
In practice, $\mu$ is unknown, so it's not known if the interval contains $\mu$ .
95% confidence is based on using $Z = 1.96$ .
95% of intervals formed this way may contain $\mu$ .
Based on the selected sample, one can be 95% confident the interval may contain $\mu$ (a 95% confidence interval).

General Formula for Confidence Intervals

The general formula for all confidence intervals is:
- Point Estimate ± (Critical Value)(Standard Error)
  - Point Estimate: The sample statistic estimating the population parameter.
  - Critical Value: A table value based on the sampling distribution and desired confidence level.
  - Standard Error: The standard deviation of the point estimate.

Confidence Level, $(1 - \alpha)$

If the confidence level is 95%, $(1 - \alpha) = 0.95$ , so $\alpha = 0.05$ .
Relative frequency interpretation:
- 95% of all confidence intervals constructed will contain the true parameter.
A specific interval either contains or does not contain the true parameter.
- There is no probability involved for a specific interval.

Confidence Interval for $\mu$ ( $\sigma$ Known)

Assumptions:

Population standard deviation $\sigma$ is known.
Population is normally distributed.
If the population is not normal, use a large sample size (n > 30).

Confidence interval estimate:

$\overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$
- $\overline{x}$ is the point estimate.
- $Z_{\alpha/2}$ is the normal distribution critical value for a probability of $\alpha/2$ in each tail.
- $\frac{\sigma}{\sqrt{n}}$ is the standard error.

Common Levels of Confidence

Confidence Level	Confidence Coefficient 1 − α	$Z_{\alpha/2}$ value
80.0%	0.800	1.280
90.0%	0.900	1.645
95.0%	0.950	1.960
98.0%	0.980	2.330
99.0%	0.990	2.580
99.8%	0.998	3.080
99.9%	0.999	3.270

Example

A sample of 11 circuits from a normal population has a mean resistance of 2.22 ohms.
The population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance.
- $\overline{x} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = 2.22 \pm (1.96) \frac{0.35}{\sqrt{11}} = 2.22 \pm 0.2068$

Interpretation

We are 95% confident that the true mean resistance is between 2.0132 and 2.4268 ohms.
Although the true mean may or may not be in this interval, 95% of intervals formed this way may contain the true mean.

Do You Ever Truly Know $\sigma$ ?

Probably not!
In real-world business situations, $\sigma$ is usually unknown.
If $\sigma$ is known, then $\mu$ is also known (since calculating $\sigma$ requires knowing $\mu$ ).
If $\mu$ is known, there's no need to estimate it.

Confidence Interval for $\mu$ ( $\sigma$ Unknown)

If the population standard deviation $\sigma$ is unknown, substitute the sample standard deviation, S.
This introduces extra uncertainty since S varies from sample to sample.
Use the t-distribution instead of the normal distribution.

Assumptions:

Population standard deviation is unknown.
Population is normally distributed.
If the population is not normal, use a large sample (n > 30).

Use Student’s t Distribution

Confidence Interval Estimate:

$\overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}}$
- Where $t_{\alpha/2}$ is the critical value of the t-distribution with $n - 1$ degrees of freedom and an area of $\alpha/2$ in each tail.

Student’s t Distribution

The t-distribution is a family of distributions.
The $t_{\alpha/2}$ value depends on degrees of freedom (d.f.).
Degrees of freedom represent the number of observations free to vary after the sample mean has been calculated.
- $d.f. = n - 1$

Degrees of Freedom (df)

Idea: Number of observations that are free to vary after sample mean has been calculated.
Example: Suppose the mean of 3 numbers is 8.0. Let $X1 = 7$ and $X2 = 8$ . Then $X3$ must be 9 (i.e., $X3$ is not free to vary).
Here, $n = 3$ , so degrees of freedom $= n – 1 = 3 – 1 = 2$ . Two values can be any numbers, but the third is not free to vary for a given mean.

Example of t distribution confidence interval

A random sample of $n = 25$ has $\overline{x} = 50$ and $S = 8$ .
Form a 95% confidence interval for $\mu$ .
- $d.f. = n – 1 = 24$ , so $t{\alpha/2} = t{0.025} = 2.064$
- The confidence interval is: $\overline{x} \pm t_{\alpha/2} \frac{S}{\sqrt{n}} = 50 \pm (2.064) \frac{8}{\sqrt{25}} = 50 \pm 3.302$
- The confidence interval is $46.698 \le \mu \le 53.302$
Interpreting this interval requires the approximation that the population you are sampling from is approximately a normal distribution (especially since n is only 25). This condition can be checked by creating a:
- Normal probability plot or
- Boxplot

Confidence Intervals for the Population Proportion, $\pi$

An interval estimate for the population proportion ( $\pi$ ) can be calculated by adding an allowance for uncertainty to the sample proportion (p).
Recall that the distribution of the sample proportion is approximately normal if the sample size is large, and we must have np > 5 and n(1-p) > 5 and the standard error of the proportion is:
- $\sigma_{\overline{p}} = \sqrt{\frac{p(1 - p)}{n}}$

Confidence Interval Endpoints

Upper and lower confidence limits for the population proportion are calculated with the formula:
- $p \pm Z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}$
  - Where:
    - $Z_{\alpha/2}$ is the standard normal value for the level of confidence desired
    - p is the sample proportion
    - n is the sample size
      *Note: must have np > 5 and n(1-p) > 5

Example

A random sample of 100 people shows that 25 are left-handed.
Form a 95% confidence interval for the true proportion of left-handers.
- $p \pm Z_{\alpha/2}\sqrt{\frac{p(1 - p)}{n}} = \frac{25}{100} \pm 1.96\sqrt{\frac{(.25)(.75)}{100}} =$
- $= \frac{25}{100} \pm 1.96(0.0433)$
So: We are 95% confident that $X \pm 0.0433$ contains the population proportion.
- $0.1651 \le p \le 0.3349$

Interpretation

We are 95% confident that the true percentage of left-handers in the population is between 16.51% and 33.49%.
Although the interval from 0.1651 to 0.3349 may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.

Determining Sample Size

Sampling Error

The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence ( $1 - \alpha$ ).
The margin of error is also called sampling error, it is:
- The amount of imprecision in the estimate of the population parameter
- The amount added and subtracted to the point estimate to form the confidence interval.
For the Mean
- $e = Z_{\alpha/2} \sqrt{\frac{\sigma}{n}}$
- Now solve for n
- $n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2}$
To determine the required sample size for the mean, you must know:
- The desired level of confidence ( $1 - \alpha$ ), which determines the critical value, $Z_{\alpha/2}$
- The acceptable sampling error, e
- The standard deviation, $\sigma$

Required Sample Size Example

If $\sigma = 45$ , what sample size is needed to estimate the mean within $± 5$ with 90% confidence?
- $n = \frac{Z^2_{\alpha/2}\sigma^2}{e^2} = \frac{(1.645^2)(45^2)}{5^2} = 219.19$
- so the require sample size is 220 (always round up).

If $\sigma$ is unknown

If unknown, $\sigma$ can be estimated when using the required sample size formula:
- Use a value for $\sigma$ that is expected to be at least as large as the true $\sigma$ .
- Select a pilot sample and estimate $\sigma$ with the sample standard deviation, S

Determining Sample Size For the Population

$e = Z(\sqrt{\frac{\pi(1−\pi)}{n}})$
Solve for n
- $n = \frac{Z^2_{\alpha/2}(\pi(1−\pi)}{e^2}$
To determine the required sample size for the proportion, you must know:
- The desired level of confidence ( $1 - \alpha$ , which determines the critical value, $Z_{\alpha/2}$
- The acceptable sampling error, e
- The true proportion of events of interest, $\pi$
  - $\pi$ can be estimated with a pilot sample if necessary (or conservatively use 0.5 as an estimate of $\pi$ )

Required Sample Size Example

How large a sample would be necessary to estimate the true proportion defective in a large population within $± 3%$ , with 95% confidence?
(Assume a pilot sample yields p = 0.12)
Solution:
- For 95% confidence, use $Z_{\alpha/2} = 1.96$
- $e = 0.03$
- $p = 0.12$ , so use this to estimate $\pi$
- $n = \frac{Z^2_{\alpha/2}(\pi(1 − \pi)}{e^2} = \frac{(1.96)^2(.12)(.88)}{(0.03)^2} = 450.74$
So: use n = 451

Ethical Issues

A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate.
The level of confidence should always be reported.
The sample size should be reported.
An interpretation of the confidence interval estimate should also be provided.

Final Note

The important thing to remember is that the margin of error, confidence interval, is generally a function three things, the degree of confidence required, the sample size and the percentage being estimated.
Thus, sampling error will decrease as:
- The sample size (or number of interviews) gets bigger;
- The percentage estimated approaches 0% or 100% or
- The need to be certain about the result (e.g. the ‘‘confidence level’’) gets smaller.

Confidence Intervals Notes

Confidence Intervals

Introduction

Point and Interval Estimates

Table 1: Point Estimates

Understanding Confidence Intervals

Key Aspects of Confidence Intervals

Confidence Interval Example: Cereal Fill

Practical Considerations

General Formula for Confidence Intervals

Confidence Level, (1−α)(1 - \alpha)(1−α)

Confidence Interval for μ\muμ (σ\sigmaσ Known)

Assumptions:

Confidence interval estimate:

Common Levels of Confidence

Example

Interpretation

Do You Ever Truly Know σ\sigmaσ?

Confidence Interval for μ\muμ (σ\sigmaσ Unknown)

Assumptions:

Use Student’s t Distribution

Confidence Interval Estimate:

Student’s t Distribution

Degrees of Freedom (df)

Example of t distribution confidence interval

Confidence Intervals for the Population Proportion, π\piπ

Confidence Interval Endpoints

Example

Interpretation

Determining Sample Size

Sampling Error

Required Sample Size Example

If σ\sigmaσ is unknown

Determining Sample Size For the Population

Required Sample Size Example

Ethical Issues

Final Note

Confidence Level, $(1 - \alpha)$

Confidence Interval for $\mu$ ( $\sigma$ Known)

Do You Ever Truly Know $\sigma$ ?

Confidence Interval for $\mu$ ( $\sigma$ Unknown)

Confidence Intervals for the Population Proportion, $\pi$

If $\sigma$ is unknown