Prob & Stats Sections 9.2 & 9.3
Estimating the Population Mean
Section 9.2 Overview
Goals for Today:
Review confidence intervals (CIs) for the population proportion discussed in the previous class.
Extend the concept to create confidence intervals for the population mean.
Introduction to a new distribution, the Student's t distribution, to facilitate this.
Note that Section 9.3 will be covered as well, primarily as extra practice and review material for homework.
Key Concepts
Estimating the Mean (µ)
Confidence Interval Construction:
Confidence intervals can be constructed using the general formula:
CI = ext{point estimate} \, ext{± margin of error}
For population mean:
The point estimate for mean (µ) is calculated from sample data.
Road Block: Population Standard Deviation (σ)
Challenges in Calculation:
The ideal form for a CI for the mean is:
ext{CI} = ± 2(√)
Difficulty arises because, typically, the population standard deviation (σ) is unknown while we estimate the mean (µ).
Using Sample Standard Deviation (s) to Estimate σ
Estimating Population Variability:
The sample standard deviation (s) serves as a reasonable estimate for the population standard deviation (σ).
Limitations:
Sample standard deviation does not follow a normal distribution and may behave poorly, especially with small sample sizes.
Issues can occur, making calculations unreliable.
Introduction to Student’s t Distribution
Substitution for z Distribution:
Given that the conditions for using the z distribution are often not met, we turn to the Student's t distribution.
Calculation of t Score:
The t score is calculated from the sample mean (ar{x}) using:
t = rac{ar{x} - µ}{s/√{n}}
This t score follows a t distribution with n - 1 degrees of freedom where n is the sample size.
Interpretation of t Score
t Score Significance:
The t score represents how many sample standard errors the sample mean is away from the population mean.
Example:
A t score of -1.63 indicates that the sample mean is 1.63 standard errors below the population mean.
Characteristics of the t Distribution
Distinct Features:
Varies based on degrees of freedom (dependent on sample size n).
Centered at 0 and symmetric around 0.
Area under the curve equals 1, with equal halves on either side of 0.
As t approaches ±∞, the curve approaches but does not equal 0.
Tails of the t distribution exhibit slightly more area than normal distribution tails due to the extra variability from using sample standard deviation (s).
As the sample size increases, the t distribution approaches a standard normal distribution (per the Law of Large Numbers).
Finding t Critical Values
Critical Values Determination:
Similar to finding z critical values, finding t critical values requires consideration of the sample size and degrees of freedom.
Using t Tables
Finding Critical Values Using Tables:
Steps to locate the critical t value:
Read the degrees of freedom (n-1) from the left side of the table.
Locate the column for desired probability (one-tailed/two-tailed) at the top or confidence level at the bottom.
The intersection of the row and column indicates the critical t value.
Using StatCrunch for t Critical Values
Software Approach:
Navigate to: Stat > Calculators > T.
Enter degrees of freedom (n-1) and set up appropriate parameters.
To find t_{α}, use the format P(X > __) = α.
Constructing Confidence Intervals for the Mean
Formulation of Confidence Interval:
The (1-α)100% confidence interval for the population mean (µ) can be expressed with the following formula:
CI = ar{x} - t{α/2} rac{s}{√{n}} \text{ to } ar{x} + t{α/2} rac{s}{√{n}}
Note the importance of using the critical t value associated with n - 1 degrees of freedom.
Assumptions for t Interval
Validity Conditions:
Sample data must arise from a randomized experiment or simple random sample (SRS).
The sample size should be no larger than 5% of the original population.
Data must come from a normally distributed population or a sufficiently large sample size (generally n ≥ 30).
CIs for the Mean in StatCrunch
Data Input Procedure:
For raw data, input into a designated column, with optional labeling.
Navigate to: Stat > T Stats > One Sample:
With raw data: Select the relevant column.
With summary data: Input sample mean, sample standard deviation, and sample size:
Choose the confidence interval radio button and indicate the desired confidence level.
Click compute to obtain results.
Example Calculation
Illustrative Case:
Sample of 18 cans with an average volume of 11.9 ounces and a standard deviation of 0.02 ounces.
Required: Calculate a 95% confidence interval for the population mean volume.
Two methods: By hand calculation and using StatCrunch.
Addressing Violated Assumptions
Robustness of t Interval:
The t interval is somewhat robust to minor deviations from normality. However, significant outliers or extreme non-normal conditions can distort results.
If sample size (n) is less than 30 and the dataset displays non-normality, standard methods cannot be applied.
Alternatives include resampling (bootstrapping) and nonparametric tests, which sidestep distributional assumptions, though these methods are not part of the current course curriculum.
In cases where assumptions are unmet, it is vital to acknowledge that constructing a CI using the t distribution is not feasible.
Estimating Required Sample Size
Formula for Sample Size Estimation:
To estimate population mean (µ) with a desired (1-α)100% confidence level and a specified margin of error (E):
n = rac{(z_{α/2} * rac{σ}{E})^2}
Important note: Always round up when computing sample size.
Justification for using z here: Traditional practice dictates using the z value for estimation despite reliance on sample data.
Sample Size Estimation Using StatCrunch
Detailed Procedure:
Access the command: Stat > Z Stats > One Sample > Power/Sample Size.
Switch to the "Confidence Interval Width" tab.
Input the total width which is double the desired margin of error (E).
Ensure the sample size cell is cleared and then compute to derive the necessary sample size.
Another Example on Sample Size
Specific Case:
Interest lies in estimating the mean miles per gallon (MPG) of a car model.
Assumptions: Normal distribution and sample standard deviation is approximately 2.92.
Objective: Determine how many cars need to be observed to estimate mean MPG within 0.5 MPG at a 95% confidence level.
Section 9.3: Putting Everything Together
Overview of Confidence Intervals:
Decision making based on sample information:
For proportions, the requirement is np(1-p) ≥ 10 for robust estimation.
For means, evaluate if sample size (n) exceeds 30:
If yes, check for normality and absence of outliers.
If data is approximately normal, proceed with computing a t-interval.
If assumptions fail, non-parametric methods or bootstrap analyses should be considered.
Your workflow combines systematic inferential statistics to enhance understanding of population parameters while addressing methodological challenges through robust statistical techniques.
Summary
These notes proceed extensively through estimating population means, focusing on constructing and interpreting confidence intervals utilizing both z and t distributions, ensuring that methodological assumptions are acknowledged and addressed within statistical practice.
Broader context of decision-making based on sample statistics provides students with a foundational grasp of inferential methods critical for research and data analyses in a variety of fields.