BUSN1010 - Estimation Notes
BUSN1010 Analytics in Business: Estimation
Chapter 8: Estimating Single Population Parameters
Chapter Goals
- Distinguish between point estimate and confidence interval estimate.
- Construct and interpret confidence intervals for a single population mean using z and t distributions.
- Determine the required sample size to estimate a single population mean within a specified margin of error.
- Form and interpret a confidence interval estimate for a single population proportion.
- Determine the required sample size to estimate a single population proportion within a specified margin of error.
Confidence Intervals
- Confidence Intervals for the Population Mean, μ
- When Population Standard Deviation σ is Known
- When Population Standard Deviation σ is Unknown
- Determining the Required Sample Size
- Confidence Intervals for the Population Proportion, p
Point and Interval Estimates
- A point estimate is a single number from a sample used to estimate the corresponding population parameter.
- A confidence interval provides additional information about variability when estimating values for a population parameter.
Point Estimates
- Estimating a Population Parameter with a Sample Statistic (a Point Estimate):
- Mean: Sample Mean (\bar{x}) estimates Population Mean (μ).
- Proportion: Sample Proportion (p) estimates Population Proportion (π).
Estimation
- Estimating the mean of a population (μ) from a sample.
- Any function computed from a sample is a potential estimator.
- Choosing an estimator: Unbiased, Minimum Variance, Consistent.
Unbiasedness
- An unbiased estimator produces estimates centered around the true population value.
- Unbiased Estimator: Estimates are centered around the true value.
- Biased Estimator: Estimates are systematically off the true value.
Minimum Variance
- Choosing between estimators: lower variance is better.
- High Variance: Estimates are spread out.
- Low Variance: Estimates are close together.
- Sample Mean is the best estimator for Population Mean (unbiased and minimum variance).
Confidence Intervals
- Quantifying uncertainty associated with a point estimate.
- Interval estimate provides more information than a point estimate.
- Interval estimates are called confidence intervals.
Confidence Interval Estimate
- An interval gives a range of values.
- Takes into consideration variation in sample statistics from sample to sample.
- Based on observation from 1 sample.
- Gives information about closeness to unknown population parameters.
- Stated in terms of level of confidence, but never 100% sure.
Estimation Process
- Population with unknown mean (μ).
- Random sample with mean (\bar{x} = 50).
- 95% confidence that μ is between 40 and 60.
General Formula
The general formula for all confidence intervals is:
\text{Point Estimate} ± (\text{Critical Value})(\text{Standard Error})
Confidence Level
- Confidence Level: Confidence that the interval will contain the unknown population parameter.
- Determines the critical value.
- A percentage (less than 100%).
Confidence Level, (1-α)
- If confidence level = 95%, then (1 - α) = 0.95
- Relative frequency interpretation: 95% of all constructed confidence intervals will contain the true parameter in the long run.
- A specific interval either will or will not contain the true parameter; no probability involved for a specific interval.
Confidence Intervals: Types
- Population Mean
- σ Known
- σ Unknown
- Population Proportion
Confidence Interval for μ (σ Known)
- Assumptions:
- Population standard deviation σ is known.
- Population is normally distributed (or use large sample if not normal).
Finding the Critical Value
- For a 95% confidence interval, the critical values are z{0.025} = -1.96 and z{0.025} = 1.96
Common Levels of Confidence
- Commonly used confidence levels: 90%, 95%, and 99%.
| Confidence Level | Confidence Coefficient, (1 - \alpha) | z value |
|---|---|---|
| 80% | 0.80 | 1.28 |
| 90% | 0.90 | 1.645 |
| 95% | 0.95 | 1.96 |
| 98% | 0.98 | 2.33 |
| 99% | 0.99 | 2.57 |
| 99.8% | 0.998 | 3.08 |
| 99.9% | 0.999 | 3.27 |
Interval and Level of Confidence
- 100(1-\alpha)\% of intervals constructed contain μ; 100α\% do not.
Margin of Error
- Margin of Error (e): amount added and subtracted to the point estimate to form the confidence interval.
- Example: Margin of error for estimating μ, σ known: e = z_{\alpha/2} * \frac{σ}{\sqrt{n}}
Factors Affecting Margin of Error
- Data variation, σ$*: e increases as σ increases.
- Sample size, n: e decreases as n increases.
- Level of confidence, 1 - α: e increases if 1 - α increases.
Example: Waiting Times
- Standard deviation for waiting times is 1 minute (σ = 1).
- Waiting times are normally distributed.
- Collected waiting times for 20 customers; average waiting time is 3.5 minutes (\bar{x} = 3.5).
- Determine a 95% confidence interval for the true mean waiting time.
Interpretation
- Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true mean.
- Incorrect interpretation: there is 95% probability that this interval contains the true population mean.
Problem: σ Known
- 8-3, 8-5, and 8-6 (b and c only)
- 8.3: Construct a 95% confidence interval estimate for the population mean given\bar{x} = 300, σ = 55, n = 250.
- 8.5: Determine the 90% confidence interval estimate for the population mean of a normal distribution given n=100, \bar{x} =121 and σ =1,200.
- 8.6: Determine the margin of error for a confidence interval estimate for the population mean of a normal distribution given the following information:
- b. confidence level=0.99, n=25, σ =3.47
- c. confidence level=0.98, standard error=2.356
- Answers:
- 8.3: 293.18 to 306.82
- 8.5: 1180.10 to 1219.90
- 8.6
- b: ± 1.7871
- c: ± 5.4895
Confidence Interval for μ (σ is unknown)
- If the population standard deviation σ is unknown, substitute the sample standard deviation, s.
- This introduces extra uncertainty, so use the t distribution instead of the normal distribution.
Confidence interval for μ (σ is unknown) (continued…)
- Assumptions:
- Population standard deviation is unknown.
- Population is normally distributed (or use large sample if not normal).
- Use Student’s t Distribution.
- Confidence Interval Estimate:
Student's t Distribution
- The t is a family of distributions.
- The t value depends on degrees of freedom (d.f.).
- Degrees of freedom are the number of observations that are free to vary after the sample mean has been calculated: d.f. = n - 1
Degrees of freedom (df)
- Idea: Number of observations that are free to vary after sample mean has been calculated
- Example: Suppose the mean of 3 numbers is 8.0
- Let x_1 = 7
- Let x_2 = 8
- What is x_3? If the mean of these three value of 8.0, then x3 must be 9 (i.e. x3 is not free to vary)
- Here, n = 3, so degrees of freedom = n-1 = 3 – 1 = 2 (2 values can be any numbers, but the third is not free to vary for a given mean)
Student’s t Distribution
- t distributions are bell-shaped and symmetrical but have fatter tails than the normal distribution.
- Note: t approaches z as n increases.
Student’s t Table
- The body of the table contains t values NOT probabilities.
t-Distribution Table Example
- Example:
- n = 10
- \bar{x} = 8.088
- s = 4.64
- Confidence level = 95%
- d.f. = n-1 = 9
- t = 2.2622
Comparison of t and z values
- Note: t approaches z as n increases
Example: VCE Students Income
- Random sample of n = 25 VCE students show they have an average income of $50 with a standard deviation of $8.
- Form a 95% confidence interval for μ, the average income for VCE students.
- d.f. = n – 1 = 24, so t{\alpha/2, n-1} = t{0.025, 24} =
- The confidence interval is
Approximation for large samples
- Since t approaches z as the sample size increases, an approximation is sometimes used when n ≥ 30
Problem: σ Unknown
- 8-1 and 8-16
- 8-1. Assuming the population of interest is approximately normally distributed, construct a 95% confidence interval estimate for the population mean given the following values: \bar{x} =18.4 s=4.2 n=13
- 8-16. Bolton, Inc., an Internet service provider (ISP), has experienced rapid growth in the past five years. As a part of its marketing strategy, the company promises fast connections and dependable service. To achieve its objectives, the company constantly evaluates the capacity of its servers. One component of its evaluation is an analysis of the average amount of time a customer is connected and actively using the Internet daily. A random sample of customer records shows the following daily usage times, in minutes:
- a. Using the sample data, compute the best point estimate of the population mean for daily usage times for Bolton’s customers.
- b. The managers of Bolton’s marketing department would like to develop a confidence interval estimate for the population mean daily customer usage time. Because the population standard deviation of daily customer usage time is unknown and the sample size is small, what assumption must the marketing managers make concerning the population of daily customer usage times?
- c. Construct and interpret a confidence interval for the mean daily usage time for Bolton’s customers.
- d. Assume that before the sample was taken, Bolton’s marketing staff believed that mean daily usage for its customers was . Does their assumption concerning mean daily usage seem reasonable based on the confidence interval developed in part c?
Determining Sample Size
- The required sample size can be found to reach a desired margin of error (e) and level of confidence (1 - \alpha)
- Required sample size to estimate μ, σ known:
Required Sample Size Example
- If s = 45, what sample size is needed to estimate the population mean, with 90% confidence of being correct within ± 5?
- (Always round up)
- So the required sample size is n = 220
If σ is unknown
- If unknown, σ can be estimated when using the required sample size formula
- Use a value for σ that is expected to be at least as large as the true σ
- Select a pilot sample and estimate σ with the sample standard deviation, s
- Use the range R to estimate the standard deviation using σ = R/6 (or R/4 for a more conservative estimate, producing a larger sample size)
Sample Size Problems
- 8-27. What sample size is needed to estimate a population mean within of the true mean value using a confidence level of 95% if the true population variance is known to be 122,500
Confidence Intervals for the Population Proportion, π
- An interval estimate for the population proportion (π) can be calculated by adding an allowance for uncertainty to the sample proportion (p)
Confidence Intervals for the Population Proportion, π
- Recall that the distribution of the sample proportion is approximately normal if the sample size is large, with standard deviation
Confidence interval endpoints
Upper and lower confidence limits for the population proportion are calculated with the formula
where
- z is the standard normal value for the level of confidence desired
- p is the sample proportion
- n is the sample size
Interpretation
- We are 95% confident that the true percentage of left-handers in the population is between 16.5% and 33.5%.
- Although this range may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.
Changing the sample size
- Increases in the sample size reduce the width of the confidence interval.
Problem: Population Proportion
- 8-50. A decision maker is interested in estimating a population proportion. A sample of size n=150 yields 115 successes. Based on these sample data, construct a 90% confidence interval estimate for the true population proportion.
- 8-51. At issue is the proportion of people in a particular county who do not have health care insurance coverage. A simple random sample of 240 people was asked if they have insurance coverage, and 66 replied that they did not have coverage. Based on these sample data, determine the 95% confidence interval estimate for the population proportion.
Finding the Required Sample Size for proportion problems
Define the margin of error:
E = z_{\alpha/2} \sqrt{\frac{\pi(1-\pi)}{n}}
Solve for n:
n = \frac{z_{\alpha/2}^2 \pi(1-\pi)}{E^2}
π can be estimated with a pilot sample, if necessary (or conservatively use π = .50)
What sample size…?
- How large a sample would be necessary to estimate the true proportion defective in a large population within 3%, with 95% confidence?
What sample size…?
- Solution:
- For 95% confidence, use Z = 1.96
- E = .03
- p not given, so use p = 0.5 to estimate \pi
- So use n = 1068
What sample size…?
- How large a sample would be necessary to estimate the true proportion defective in a large population within 3%, with 95% confidence, assuming a pilot sample yields p = .12
What sample size…?
- Solution:
- For 95% confidence, use Z = 1.96
- E = .03
- p = .12 , so use this to estimate \pi
- So use n = 451
Problem: Sample Size
- 8-49. A pilot sample of 75 items was taken, and the number of items with the attribute of interest was found to be 15. How many more items must be sampled to construct a 99% confidence interval estimate for Π with a 0.025 margin of error?
- 8-52. A computer software distributor is planning to survey customers to determine the proportion who will renew their software subscription for the coming year. The company wants to estimate the population proportion with 90% confidence and a margin of error equal to ±0.04. What sample size is required.
- 8-53. A random sample of size 150 taken from a population yields a proportion equal to 0.35.
- a. Determine if the sample size is large enough so that the sampling distribution can be approximated by a normal distribution.
- b. Construct a 90% confidence interval for the population proportion.
- c. Interpret the confidence interval calculated in part b.
- d. Produce the margin of error associated with this confidence interval.
Ethical Issues
- A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate
- The level of confidence should always be reported
- The sample size should be reported
- An interpretation of the confidence interval estimate should also be provided
Chapter Summary
- Illustrated estimation process
- Discussed point estimates
- Introduced interval estimates
- Discussed confidence interval estimation for the mean (σ known)
- Addressed determining sample size (mean and proportion)
- Discussed confidence interval estimation for the mean (σ$$ unknown)
- Discussed confidence interval estimation for the proportion
Key Terms
- Confidence Interval
- Confidence Level
- Point Estimate
- Sampling Error