Prob & Stats Sections 9.2 & 9.3

Estimating the Population Mean

Section 9.2 Overview

  • Goals for Today:

    • Review confidence intervals (CIs) for the population proportion discussed in the previous class.

    • Extend the concept to create confidence intervals for the population mean.

    • Introduction to a new distribution, the Student's t distribution, to facilitate this.

    • Note that Section 9.3 will be covered as well, primarily as extra practice and review material for homework.

Key Concepts

Estimating the Mean (µ)
  • Confidence Interval Construction:

    • Confidence intervals can be constructed using the general formula:

    • CI = ext{point estimate} \, ext{± margin of error}

    • For population mean:

    • The point estimate for mean (µ) is calculated from sample data.

Road Block: Population Standard Deviation (σ)
  • Challenges in Calculation:

    • The ideal form for a CI for the mean is:

    • ext{CI} = ± 2(√)

    • Difficulty arises because, typically, the population standard deviation (σ) is unknown while we estimate the mean (µ).

Using Sample Standard Deviation (s) to Estimate σ
  • Estimating Population Variability:

    • The sample standard deviation (s) serves as a reasonable estimate for the population standard deviation (σ).

    • Limitations:

    • Sample standard deviation does not follow a normal distribution and may behave poorly, especially with small sample sizes.

    • Issues can occur, making calculations unreliable.

Introduction to Student’s t Distribution
  • Substitution for z Distribution:

    • Given that the conditions for using the z distribution are often not met, we turn to the Student's t distribution.

    • Calculation of t Score:

    • The t score is calculated from the sample mean (ar{x}) using:

      • t = rac{ar{x} - µ}{s/√{n}}

    • This t score follows a t distribution with n - 1 degrees of freedom where n is the sample size.

Interpretation of t Score
  • t Score Significance:

    • The t score represents how many sample standard errors the sample mean is away from the population mean.

    • Example:

    • A t score of -1.63 indicates that the sample mean is 1.63 standard errors below the population mean.

Characteristics of the t Distribution

  • Distinct Features:

    • Varies based on degrees of freedom (dependent on sample size n).

    • Centered at 0 and symmetric around 0.

    • Area under the curve equals 1, with equal halves on either side of 0.

    • As t approaches ±∞, the curve approaches but does not equal 0.

    • Tails of the t distribution exhibit slightly more area than normal distribution tails due to the extra variability from using sample standard deviation (s).

    • As the sample size increases, the t distribution approaches a standard normal distribution (per the Law of Large Numbers).

Finding t Critical Values

  • Critical Values Determination:

    • Similar to finding z critical values, finding t critical values requires consideration of the sample size and degrees of freedom.

Using t Tables
  • Finding Critical Values Using Tables:

    • Steps to locate the critical t value:

    • Read the degrees of freedom (n-1) from the left side of the table.

    • Locate the column for desired probability (one-tailed/two-tailed) at the top or confidence level at the bottom.

    • The intersection of the row and column indicates the critical t value.

Using StatCrunch for t Critical Values
  • Software Approach:

    • Navigate to: Stat > Calculators > T.

    • Enter degrees of freedom (n-1) and set up appropriate parameters.

    • To find t_{α}, use the format P(X > __) = α.

Constructing Confidence Intervals for the Mean

  • Formulation of Confidence Interval:

    • The (1-α)100% confidence interval for the population mean (µ) can be expressed with the following formula:

    • CI = ar{x} - t{α/2} rac{s}{√{n}} \text{ to } ar{x} + t{α/2} rac{s}{√{n}}

    • Note the importance of using the critical t value associated with n - 1 degrees of freedom.

Assumptions for t Interval
  • Validity Conditions:

    • Sample data must arise from a randomized experiment or simple random sample (SRS).

    • The sample size should be no larger than 5% of the original population.

    • Data must come from a normally distributed population or a sufficiently large sample size (generally n ≥ 30).

CIs for the Mean in StatCrunch
  • Data Input Procedure:

    • For raw data, input into a designated column, with optional labeling.

    • Navigate to: Stat > T Stats > One Sample:

    • With raw data: Select the relevant column.

    • With summary data: Input sample mean, sample standard deviation, and sample size:

      • Choose the confidence interval radio button and indicate the desired confidence level.

      • Click compute to obtain results.

Example Calculation
  • Illustrative Case:

    • Sample of 18 cans with an average volume of 11.9 ounces and a standard deviation of 0.02 ounces.

    • Required: Calculate a 95% confidence interval for the population mean volume.

    • Two methods: By hand calculation and using StatCrunch.

Addressing Violated Assumptions
  • Robustness of t Interval:

    • The t interval is somewhat robust to minor deviations from normality. However, significant outliers or extreme non-normal conditions can distort results.

    • If sample size (n) is less than 30 and the dataset displays non-normality, standard methods cannot be applied.

    • Alternatives include resampling (bootstrapping) and nonparametric tests, which sidestep distributional assumptions, though these methods are not part of the current course curriculum.

    • In cases where assumptions are unmet, it is vital to acknowledge that constructing a CI using the t distribution is not feasible.

Estimating Required Sample Size
  • Formula for Sample Size Estimation:

    • To estimate population mean (µ) with a desired (1-α)100% confidence level and a specified margin of error (E):

    • n = rac{(z_{α/2} * rac{σ}{E})^2}

    • Important note: Always round up when computing sample size.

    • Justification for using z here: Traditional practice dictates using the z value for estimation despite reliance on sample data.

Sample Size Estimation Using StatCrunch
  • Detailed Procedure:

    • Access the command: Stat > Z Stats > One Sample > Power/Sample Size.

    • Switch to the "Confidence Interval Width" tab.

    • Input the total width which is double the desired margin of error (E).

    • Ensure the sample size cell is cleared and then compute to derive the necessary sample size.

Another Example on Sample Size
  • Specific Case:

    • Interest lies in estimating the mean miles per gallon (MPG) of a car model.

    • Assumptions: Normal distribution and sample standard deviation is approximately 2.92.

    • Objective: Determine how many cars need to be observed to estimate mean MPG within 0.5 MPG at a 95% confidence level.

Section 9.3: Putting Everything Together

  • Overview of Confidence Intervals:

    • Decision making based on sample information:

    • For proportions, the requirement is np(1-p) ≥ 10 for robust estimation.

    • For means, evaluate if sample size (n) exceeds 30:

      • If yes, check for normality and absence of outliers.

      • If data is approximately normal, proceed with computing a t-interval.

      • If assumptions fail, non-parametric methods or bootstrap analyses should be considered.

Your workflow combines systematic inferential statistics to enhance understanding of population parameters while addressing methodological challenges through robust statistical techniques.

Summary

  • These notes proceed extensively through estimating population means, focusing on constructing and interpreting confidence intervals utilizing both z and t distributions, ensuring that methodological assumptions are acknowledged and addressed within statistical practice.

  • Broader context of decision-making based on sample statistics provides students with a foundational grasp of inferential methods critical for research and data analyses in a variety of fields.