Stats

Foundations of Confidence Intervals for Population Mean
  • Recap of Previous Concepts:     * Last class concluded finding confidence levels and determining appropriate sample sizes (nn) for proportions.     * The current focus is on constructing confidence intervals for the population mean, denoted by the Greek letter μ\mu.

  • The T-Distribution:     * When dealing with small samples, the population standard deviation (σ\sigma) is typically unknown.     * In these scenarios, the Student's $t$-distribution is used instead of the standard normal (zz) distribution.     * Properties of the $t$-distribution:         * The area to the left of a $t$-score is used for lookups. For example, if the area in the right tail is 0.100.10 (denoted as t0.10t_{0.10}), the area to the left would be 0.900.90.         * Lookups require both the cumulative area and the degrees of freedom (dfdf).

  • Assumptions and Requirements:     * The sample data must come from a normally distributed population OR the data must be obtained through a simple random sample (SRS) or a randomized experiment.     * The sample size (nn) is generally considered small if it is less than 3030 (n < 30).     * Technically, the sample size should be less than 5%5\% of the population size (NN), though this is often difficult to verify if the population size is unknown.

  • Degrees of Freedom (dfdf):     * Degrees of freedom are defined as one less than the sample size (nn).     * Formula: df=n1df = n - 1.

Construction of Confidence Intervals for μ\mu
  • Standard Formula for the Confidence Interval:     * The interval is defined by the lower and upper bounds centered around the sample mean (xˉ\bar{x}).     * Lower Bound: xˉE\bar{x} - E     * Upper Bound: xˉ+E\bar{x} + E

  • Margin of Error (EE):     * The error for the population mean when σ\sigma is unknown is calculated as:     * E=tα/2×snE = t_{\alpha/2} \times \frac{s}{\sqrt{n}}     * Where:         * xˉ\bar{x} is the sample mean.         * tα/2t_{\alpha/2} is the critical value for the given confidence level and degrees of freedom.         * ss is the sample standard deviation (sxs_x).         * nn is the sample size.

Practical Example: 2014 Toyota Camry MPG
  • Scenario: Reported miles per gallon (MPG) for n=16n = 16 owners of 2014 Toyota Camry automobiles. A 95%95\% confidence interval for the mean MPG of all such cars is needed.

  • Step 1: Identify Parameters:     * Sample Size (nn): 1616.     * Degrees of Freedom (dfdf): 161=1516 - 1 = 15.     * Confidence Level: 95%=0.9595\% = 0.95.     * Alpha (α\alpha): 10.95=0.051 - 0.95 = 0.05.     * Alpha over two (α/2\alpha/2): 0.052=0.025\frac{0.05}{2} = 0.025.

  • Step 2: Lookup Critical Value (tα/2t_{\alpha/2}):     * Using the $t$-distribution table, look for the column where the "Area in Right Tail" is 0.0250.025 and the row where df=15df = 15.     * Critical Value: t0.025=2.131t_{0.025} = 2.131.

  • Step 3: Calculate Sample Statistics (using a calculator/one-variable stats):     * Sample Mean (xˉ\bar{x}): 28.128.1.     * Sample Standard Deviation (sxs_x): 2.382.38.

  • Step 4: Calculate Margin of Error (EE):     * E=2.131×2.3816E = 2.131 \times \frac{2.38}{\sqrt{16}}     * E=2.131×2.384E = 2.131 \times \frac{2.38}{4}     * Calculated Error: 1.26791.2679.

  • Step 5: Determine the Interval:     * Lower Bound: 28.11.2679=26.832128.1 - 1.2679 = 26.8321     * Upper Bound: 28.1+1.2679=29.367928.1 + 1.2679 = 29.3679     * Interval: (26.83,29.37)(26.83, 29.37).

  • Interpretation: We are 95%95\% confident that the true population mean MPG of the 2014 Toyota Camry is between approximately 26.8326.83 and 29.37mpg29.37\,mpg. This means if we took 100100 such samples, approximately 9595 of them would contain the true population mean.

Determining Sample Size (nn) for a Mean
  • Background: When the margin of error (EE) and confidence level are predetermined, you can solve for the necessary sample size.

  • Derivation:     * Starting with the error formula based on the $z$-distribution (E=zα/2×σnE = z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}):     * n=(zα/2×σE)2n = \left( \frac{z_{\alpha/2} \times \sigma}{E} \right)^2

  • Example: Finding Required Sample Size for Camry MPG:     * Goal: Estimate mean MPG within 0.5mpg0.5\,mpg margin of error (EE) with 95%95\% confidence.     * Given s=2.38s = 2.38 (used to approximate σ\sigma).     * zα/2z_{\alpha/2} for 95%95\%: 1.961.96 (calculated via InvNorm(0.025) or via formula sheet).     * Calculation: n=(1.96×2.380.5)2n = \left( \frac{1.96 \times 2.38}{0.5} \right)^2     * n=(4.66480.5)2n = \left( \frac{4.6648}{0.5} \right)^2     * n=(9.3296)2=87.0414n = (9.3296)^2 = 87.0414.     * Rounding Rule: Always round up to the nearest whole integer to avoid underestimating the required sample size.     * Final Answer: n=88n = 88.

Working Backwards: Finding Mean and Error from Bounds
  • Scenario: You are given a confidence interval with a lower bound of 88 and an upper bound of 2222.

  • Find Sample Mean (xˉ\bar{x}):     * The point estimate is the average of the bounds.     * xˉ=22+82=302=15\bar{x} = \frac{22 + 8}{2} = \frac{30}{2} = 15.

  • Find Margin of Error (EE):     * The error is the distance from the mean to either bound.     * E=2215=7E = 22 - 15 = 7     * Alternatively: E=UpperLower2=2282=7E = \frac{\text{Upper} - \text{Lower}}{2} = \frac{22 - 8}{2} = 7.

Verifying Normality
  • The procedure requires the population to be normally distributed, especially for small samples.

  • Visual Checks:     * Box Plot: Used to check for symmetry and outliers. A box plot might appear slightly left-skewed, but as long as there are no significant outliers, the $t$-interval procedure is generally robust enough to proceed.     * Histogram: Used to check the general shape of the distribution.

Using Technology (TI-84 Instructions)
  • Navigating to the Interval Tool:     * Press STAT, move over to TESTS.     * Select 8: TInterval (Note: Do not use T-Test or Z-Test, as those are for hypothesis testing in Chapter 10).

  • Two Input Options:     1. Data: Use this if the raw data is already entered into a list (e.g., L1). The calculator will compute xˉ\bar{x} and sxs_x automatically.     2. Stats: Use this if you only have summary statistics (xˉ\bar{x}, sxs_x, nn).

  • Input Fields for TInterval (Stats mode):     * xˉ\bar{x}: Enter sample mean.     * sxs_x: Enter sample standard deviation.     * nn: Enter sample size (Note: The calculator asks for nn, not degrees of freedom).     * C-Level: Enter the confidence level as a decimal (e.g., 0.950.95 for 95%95\%).

  • Note on Accuracy: Manual calculations using table values (which are often rounded) may vary slightly from calculator results (e.g., a manual result of 29.3729.37 vs. a calculator result of 29.3629.36). Online systems like Pearson can be very picky about these rounding discrepancies.

Additional Example: Employee Ages
  • Data: n=19n = 19 employees, Mean age (xˉ\bar{x}) = 22.422.4, Standard Deviation (ss) = 3.83.8, Confidence Level = 99%99\%.

  • Manual Setup:     * df=18df = 18.     * α=0.01α/2=0.005\alpha = 0.01 \Rightarrow \alpha/2 = 0.005.     * Look up t0.005t_{0.005} for df=18df = 18: 2.8782.878.     * Lower Bound: 22.42.878×3.81922.4 - 2.878 \times \frac{3.8}{\sqrt{19}}.

  • Calculator Result:     * Input: xˉ=22.4\bar{x} = 22.4, sx=3.8s_x = 3.8, n=19n = 19, C-Level = 0.990.99.     * Output: (19.891,24.909)(19.891, 24.909).

Questions & Discussion
  • Student Question: Why did we get different results for the interval bounds?     * Response: The discrepancy (e.g., 29.33129.331 vs 29.3629.36) is due to rounding. Manual calculations with table values often introduce rounding error. It is important to check if Pearson or Brightspace assignments specify whether to use table values or technology.

  • Student Question: Are we finding nn for the ages problem?     * Response: No, the sample size (n=19n = 19) was already provided. We are constructing the confidence interval for the population mean age.