BUSN1010 - Estimation Notes

BUSN1010 Analytics in Business: Estimation

Chapter 8: Estimating Single Population Parameters

Chapter Goals

Distinguish between point estimate and confidence interval estimate.
Construct and interpret confidence intervals for a single population mean using z and t distributions.
Determine the required sample size to estimate a single population mean within a specified margin of error.
Form and interpret a confidence interval estimate for a single population proportion.
Determine the required sample size to estimate a single population proportion within a specified margin of error.

Confidence Intervals

Confidence Intervals for the Population Mean, μ
- When Population Standard Deviation σ is Known
- When Population Standard Deviation σ is Unknown
Determining the Required Sample Size
Confidence Intervals for the Population Proportion, p

Point and Interval Estimates

A point estimate is a single number from a sample used to estimate the corresponding population parameter.
A confidence interval provides additional information about variability when estimating values for a population parameter.

Point Estimates

Estimating a Population Parameter with a Sample Statistic (a Point Estimate):
- Mean: Sample Mean (\bar{x}) estimates Population Mean (μ).
- Proportion: Sample Proportion (p) estimates Population Proportion (π).

Estimation

Estimating the mean of a population (μ) from a sample.
Any function computed from a sample is a potential estimator.
Choosing an estimator: Unbiased, Minimum Variance, Consistent.

Unbiasedness

An unbiased estimator produces estimates centered around the true population value.
Unbiased Estimator: Estimates are centered around the true value.
Biased Estimator: Estimates are systematically off the true value.

Minimum Variance

Choosing between estimators: lower variance is better.
High Variance: Estimates are spread out.
Low Variance: Estimates are close together.
Sample Mean is the best estimator for Population Mean (unbiased and minimum variance).

Confidence Intervals

Quantifying uncertainty associated with a point estimate.
Interval estimate provides more information than a point estimate.
Interval estimates are called confidence intervals.

Confidence Interval Estimate

An interval gives a range of values.
Takes into consideration variation in sample statistics from sample to sample.
Based on observation from 1 sample.
Gives information about closeness to unknown population parameters.
Stated in terms of level of confidence, but never 100% sure.

Estimation Process

Population with unknown mean (μ).
Random sample with mean (\bar{x} = 50).
95% confidence that μ is between 40 and 60.

General Formula

The general formula for all confidence intervals is:

\text{Point Estimate} ± (\text{Critical Value})(\text{Standard Error})

Confidence Level

Confidence Level: Confidence that the interval will contain the unknown population parameter.
Determines the critical value.
A percentage (less than 100%).

Confidence Level, (1-α)

If confidence level = 95%, then (1 - α) = 0.95
Relative frequency interpretation: 95% of all constructed confidence intervals will contain the true parameter in the long run.
A specific interval either will or will not contain the true parameter; no probability involved for a specific interval.

Confidence Intervals: Types

Population Mean
- σ Known
- σ Unknown
Population Proportion

Confidence Interval for μ (σ Known)

Assumptions:
- Population standard deviation σ is known.
- Population is normally distributed (or use large sample if not normal).

Finding the Critical Value

For a 95% confidence interval, the critical values are z{0.025} = -1.96 and z{0.025} = 1.96

Common Levels of Confidence

Commonly used confidence levels: 90%, 95%, and 99%.

Confidence Level	Confidence Coefficient, (1 - \alpha)	z value
80%	0.80	1.28
90%	0.90	1.645
95%	0.95	1.96
98%	0.98	2.33
99%	0.99	2.57
99.8%	0.998	3.08
99.9%	0.999	3.27

Interval and Level of Confidence

100(1-\alpha)\% of intervals constructed contain μ; 100α\% do not.

Margin of Error

Margin of Error (e): amount added and subtracted to the point estimate to form the confidence interval.
Example: Margin of error for estimating μ, σ known: e = z_{\alpha/2} * \frac{σ}{\sqrt{n}}

Factors Affecting Margin of Error

Data variation, σ$*: e increases as σ increases.
Sample size, n: e decreases as n increases.
Level of confidence, 1 - α: e increases if 1 - α increases.

Example: Waiting Times

Standard deviation for waiting times is 1 minute (σ = 1).
Waiting times are normally distributed.
Collected waiting times for 20 customers; average waiting time is 3.5 minutes (\bar{x} = 3.5).
Determine a 95% confidence interval for the true mean waiting time.

Interpretation

Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true mean.
Incorrect interpretation: there is 95% probability that this interval contains the true population mean.

Problem: σ Known

8-3, 8-5, and 8-6 (b and c only)
8.3: Construct a 95% confidence interval estimate for the population mean given\bar{x} = 300, σ = 55, n = 250.
8.5: Determine the 90% confidence interval estimate for the population mean of a normal distribution given n=100, \bar{x} =121 and σ =1,200.
8.6: Determine the margin of error for a confidence interval estimate for the population mean of a normal distribution given the following information:
- b. confidence level=0.99, n=25, σ =3.47
- c. confidence level=0.98, standard error=2.356
Answers:
- 8.3: 293.18 to 306.82
- 8.5: 1180.10 to 1219.90
- 8.6
  - b: ± 1.7871
  - c: ± 5.4895

Confidence Interval for μ (σ is unknown)

If the population standard deviation σ is unknown, substitute the sample standard deviation, s.
This introduces extra uncertainty, so use the t distribution instead of the normal distribution.

Confidence interval for μ (σ is unknown) (continued…)

Assumptions:
- Population standard deviation is unknown.
- Population is normally distributed (or use large sample if not normal).
Use Student’s t Distribution.
Confidence Interval Estimate:

Student's t Distribution

The t is a family of distributions.
The t value depends on degrees of freedom (d.f.).
Degrees of freedom are the number of observations that are free to vary after the sample mean has been calculated: d.f. = n - 1

Degrees of freedom (df)

Idea: Number of observations that are free to vary after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
- Let x_1 = 7
- Let x_2 = 8
- What is x_3? If the mean of these three value of 8.0, then x3 must be 9 (i.e. x3 is not free to vary)
- Here, n = 3, so degrees of freedom = n-1 = 3 – 1 = 2 (2 values can be any numbers, but the third is not free to vary for a given mean)

Student’s t Distribution

t distributions are bell-shaped and symmetrical but have fatter tails than the normal distribution.
Note: t approaches z as n increases.

Student’s t Table

The body of the table contains t values NOT probabilities.

t-Distribution Table Example

Example:
- n = 10
- \bar{x} = 8.088
- s = 4.64
Confidence level = 95%
d.f. = n-1 = 9
t = 2.2622

Comparison of t and z values

Note: t approaches z as n increases

Example: VCE Students Income

Random sample of n = 25 VCE students show they have an average income of $50 with a standard deviation of $8.
Form a 95% confidence interval for μ, the average income for VCE students.
d.f. = n – 1 = 24, so t{\alpha/2, n-1} = t{0.025, 24} =
The confidence interval is

Approximation for large samples

Since t approaches z as the sample size increases, an approximation is sometimes used when n ≥ 30

Problem: σ Unknown

8-1 and 8-16
8-1. Assuming the population of interest is approximately normally distributed, construct a 95% confidence interval estimate for the population mean given the following values: \bar{x} =18.4 s=4.2 n=13
8-16. Bolton, Inc., an Internet service provider (ISP), has experienced rapid growth in the past five years. As a part of its marketing strategy, the company promises fast connections and dependable service. To achieve its objectives, the company constantly evaluates the capacity of its servers. One component of its evaluation is an analysis of the average amount of time a customer is connected and actively using the Internet daily. A random sample of customer records shows the following daily usage times, in minutes:
- a. Using the sample data, compute the best point estimate of the population mean for daily usage times for Bolton’s customers.
- b. The managers of Bolton’s marketing department would like to develop a confidence interval estimate for the population mean daily customer usage time. Because the population standard deviation of daily customer usage time is unknown and the sample size is small, what assumption must the marketing managers make concerning the population of daily customer usage times?
- c. Construct and interpret a confidence interval for the mean daily usage time for Bolton’s customers.
- d. Assume that before the sample was taken, Bolton’s marketing staff believed that mean daily usage for its customers was . Does their assumption concerning mean daily usage seem reasonable based on the confidence interval developed in part c?

Determining Sample Size

The required sample size can be found to reach a desired margin of error (e) and level of confidence (1 - \alpha)
Required sample size to estimate μ, σ known:

Required Sample Size Example

If s = 45, what sample size is needed to estimate the population mean, with 90% confidence of being correct within ± 5?
(Always round up)
So the required sample size is n = 220

If σ is unknown

If unknown, σ can be estimated when using the required sample size formula
Use a value for σ that is expected to be at least as large as the true σ
Select a pilot sample and estimate σ with the sample standard deviation, s
Use the range R to estimate the standard deviation using σ = R/6 (or R/4 for a more conservative estimate, producing a larger sample size)

Sample Size Problems

8-27. What sample size is needed to estimate a population mean within of the true mean value using a confidence level of 95% if the true population variance is known to be 122,500

Confidence Intervals for the Population Proportion, π

An interval estimate for the population proportion (π) can be calculated by adding an allowance for uncertainty to the sample proportion (p)

Confidence Intervals for the Population Proportion, π

Recall that the distribution of the sample proportion is approximately normal if the sample size is large, with standard deviation

Confidence interval endpoints

Upper and lower confidence limits for the population proportion are calculated with the formula
where
- z is the standard normal value for the level of confidence desired
- p is the sample proportion
- n is the sample size

Interpretation

We are 95% confident that the true percentage of left-handers in the population is between 16.5% and 33.5%.
Although this range may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.

Changing the sample size

Increases in the sample size reduce the width of the confidence interval.

Problem: Population Proportion

8-50. A decision maker is interested in estimating a population proportion. A sample of size n=150 yields 115 successes. Based on these sample data, construct a 90% confidence interval estimate for the true population proportion.
8-51. At issue is the proportion of people in a particular county who do not have health care insurance coverage. A simple random sample of 240 people was asked if they have insurance coverage, and 66 replied that they did not have coverage. Based on these sample data, determine the 95% confidence interval estimate for the population proportion.

Finding the Required Sample Size for proportion problems

Define the margin of error:

E = z_{\alpha/2} \sqrt{\frac{\pi(1-\pi)}{n}}

Solve for n:

n = \frac{z_{\alpha/2}^2 \pi(1-\pi)}{E^2}

π can be estimated with a pilot sample, if necessary (or conservatively use π = .50)

What sample size…?

How large a sample would be necessary to estimate the true proportion defective in a large population within 3%, with 95% confidence?

What sample size…?

Solution:
- For 95% confidence, use Z = 1.96
- E = .03
- p not given, so use p = 0.5 to estimate \pi
So use n = 1068

What sample size…?

How large a sample would be necessary to estimate the true proportion defective in a large population within 3%, with 95% confidence, assuming a pilot sample yields p = .12

What sample size…?

Solution:
- For 95% confidence, use Z = 1.96
- E = .03
- p = .12 , so use this to estimate \pi
So use n = 451

Problem: Sample Size

8-49. A pilot sample of 75 items was taken, and the number of items with the attribute of interest was found to be 15. How many more items must be sampled to construct a 99% confidence interval estimate for Π with a 0.025 margin of error?
8-52. A computer software distributor is planning to survey customers to determine the proportion who will renew their software subscription for the coming year. The company wants to estimate the population proportion with 90% confidence and a margin of error equal to ±0.04. What sample size is required.
8-53. A random sample of size 150 taken from a population yields a proportion equal to 0.35.
- a. Determine if the sample size is large enough so that the sampling distribution can be approximated by a normal distribution.
- b. Construct a 90% confidence interval for the population proportion.
- c. Interpret the confidence interval calculated in part b.
- d. Produce the margin of error associated with this confidence interval.

Ethical Issues

A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate
The level of confidence should always be reported
The sample size should be reported
An interpretation of the confidence interval estimate should also be provided

Chapter Summary

Illustrated estimation process
Discussed point estimates
Introduced interval estimates
Discussed confidence interval estimation for the mean (σ known)
Addressed determining sample size (mean and proportion)
Discussed confidence interval estimation for the mean (σ$$ unknown)
Discussed confidence interval estimation for the proportion

Key Terms

Confidence Interval
Confidence Level
Point Estimate
Sampling Error