Stats

Recap of Previous Concepts: * Last class concluded finding confidence levels and determining appropriate sample sizes ( $n$ ) for proportions. * The current focus is on constructing confidence intervals for the population mean, denoted by the Greek letter $\mu$ .
The T-Distribution: * When dealing with small samples, the population standard deviation ( $\sigma$ ) is typically unknown. * In these scenarios, the Student's $t$-distribution is used instead of the standard normal ( $z$ ) distribution. * Properties of the $t$-distribution: * The area to the left of a $t$-score is used for lookups. For example, if the area in the right tail is $0.10$ (denoted as $t_{0.10}$ ), the area to the left would be $0.90$ . * Lookups require both the cumulative area and the degrees of freedom ( $df$ ).
Assumptions and Requirements: * The sample data must come from a normally distributed population OR the data must be obtained through a simple random sample (SRS) or a randomized experiment. * The sample size ( $n$ ) is generally considered small if it is less than $30$ (n < 30). * Technically, the sample size should be less than $5\%$ of the population size ( $N$ ), though this is often difficult to verify if the population size is unknown.
Degrees of Freedom ( $df$ ): * Degrees of freedom are defined as one less than the sample size ( $n$ ). * Formula: $df = n - 1$ .

Standard Formula for the Confidence Interval: * The interval is defined by the lower and upper bounds centered around the sample mean ( $\bar{x}$ ). * Lower Bound: $\bar{x} - E$ * Upper Bound: $\bar{x} + E$
Margin of Error ( $E$ ): * The error for the population mean when $\sigma$ is unknown is calculated as: * $E = t_{\alpha/2} \times \frac{s}{\sqrt{n}}$ * Where: * $\bar{x}$ is the sample mean. * $t_{\alpha/2}$ is the critical value for the given confidence level and degrees of freedom. * $s$ is the sample standard deviation ( $s_x$ ). * $n$ is the sample size.

Scenario: Reported miles per gallon (MPG) for $n = 16$ owners of 2014 Toyota Camry automobiles. A $95\%$ confidence interval for the mean MPG of all such cars is needed.
Step 1: Identify Parameters: * Sample Size ( $n$ ): $16$ . * Degrees of Freedom ( $df$ ): $16 - 1 = 15$ . * Confidence Level: $95\% = 0.95$ . * Alpha ( $\alpha$ ): $1 - 0.95 = 0.05$ . * Alpha over two ( $\alpha/2$ ): $\frac{0.05}{2} = 0.025$ .
Step 2: Lookup Critical Value ( $t_{\alpha/2}$ ): * Using the $t$-distribution table, look for the column where the "Area in Right Tail" is $0.025$ and the row where $df = 15$ . * Critical Value: $t_{0.025} = 2.131$ .
Step 3: Calculate Sample Statistics (using a calculator/one-variable stats): * Sample Mean ( $\bar{x}$ ): $28.1$ . * Sample Standard Deviation ( $s_x$ ): $2.38$ .
Step 4: Calculate Margin of Error ( $E$ ): * $E = 2.131 \times \frac{2.38}{\sqrt{16}}$ * $E = 2.131 \times \frac{2.38}{4}$ * Calculated Error: $1.2679$ .
Step 5: Determine the Interval: * Lower Bound: $28.1 - 1.2679 = 26.8321$ * Upper Bound: $28.1 + 1.2679 = 29.3679$ * Interval: $(26.83, 29.37)$ .
Interpretation: We are $95\%$ confident that the true population mean MPG of the 2014 Toyota Camry is between approximately $26.83$ and $29.37\,mpg$ . This means if we took $100$ such samples, approximately $95$ of them would contain the true population mean.

Background: When the margin of error ( $E$ ) and confidence level are predetermined, you can solve for the necessary sample size.
Derivation: * Starting with the error formula based on the $z$-distribution ( $E = z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$ ): * $n = \left( \frac{z_{\alpha/2} \times \sigma}{E} \right)^2$
Example: Finding Required Sample Size for Camry MPG: * Goal: Estimate mean MPG within $0.5\,mpg$ margin of error ( $E$ ) with $95\%$ confidence. * Given $s = 2.38$ (used to approximate $\sigma$ ). * $z_{\alpha/2}$ for $95\%$ : $1.96$ (calculated via InvNorm(0.025) or via formula sheet). * Calculation: $n = \left( \frac{1.96 \times 2.38}{0.5} \right)^2$ * $n = \left( \frac{4.6648}{0.5} \right)^2$ * $n = (9.3296)^2 = 87.0414$ . * Rounding Rule: Always round up to the nearest whole integer to avoid underestimating the required sample size. * Final Answer: $n = 88$ .

Scenario: You are given a confidence interval with a lower bound of $8$ and an upper bound of $22$ .
Find Sample Mean ( $\bar{x}$ ): * The point estimate is the average of the bounds. * $\bar{x} = \frac{22 + 8}{2} = \frac{30}{2} = 15$ .
Find Margin of Error ( $E$ ): * The error is the distance from the mean to either bound. * $E = 22 - 15 = 7$ * Alternatively: $E = \frac{\text{Upper} - \text{Lower}}{2} = \frac{22 - 8}{2} = 7$ .

The procedure requires the population to be normally distributed, especially for small samples.
Visual Checks: * Box Plot: Used to check for symmetry and outliers. A box plot might appear slightly left-skewed, but as long as there are no significant outliers, the $t$-interval procedure is generally robust enough to proceed. * Histogram: Used to check the general shape of the distribution.

Navigating to the Interval Tool: * Press STAT, move over to TESTS. * Select 8: TInterval (Note: Do not use T-Test or Z-Test, as those are for hypothesis testing in Chapter 10).
Two Input Options: 1. Data: Use this if the raw data is already entered into a list (e.g., L1). The calculator will compute $\bar{x}$ and $s_x$ automatically. 2. Stats: Use this if you only have summary statistics ( $\bar{x}$ , $s_x$ , $n$ ).
Input Fields for TInterval (Stats mode): * $\bar{x}$ : Enter sample mean. * $s_x$ : Enter sample standard deviation. * $n$ : Enter sample size (Note: The calculator asks for $n$ , not degrees of freedom). * C-Level: Enter the confidence level as a decimal (e.g., $0.95$ for $95\%$ ).
Note on Accuracy: Manual calculations using table values (which are often rounded) may vary slightly from calculator results (e.g., a manual result of $29.37$ vs. a calculator result of $29.36$ ). Online systems like Pearson can be very picky about these rounding discrepancies.

Data: $n = 19$ employees, Mean age ( $\bar{x}$ ) = $22.4$ , Standard Deviation ( $s$ ) = $3.8$ , Confidence Level = $99\%$ .
Manual Setup: * $df = 18$ . * $\alpha = 0.01 \Rightarrow \alpha/2 = 0.005$ . * Look up $t_{0.005}$ for $df = 18$ : $2.878$ . * Lower Bound: $22.4 - 2.878 \times \frac{3.8}{\sqrt{19}}$ .
Calculator Result: * Input: $\bar{x} = 22.4$ , $s_x = 3.8$ , $n = 19$ , C-Level = $0.99$ . * Output: $(19.891, 24.909)$ .

Student Question: Why did we get different results for the interval bounds? * Response: The discrepancy (e.g., $29.331$ vs $29.36$ ) is due to rounding. Manual calculations with table values often introduce rounding error. It is important to check if Pearson or Brightspace assignments specify whether to use table values or technology.
Student Question: Are we finding $n$ for the ages problem? * Response: No, the sample size ( $n = 19$ ) was already provided. We are constructing the confidence interval for the population mean age.