Confidence Intervals and Sample Sizes

The handout has three columns:
- Section 7.2
- Section 7.3 (today's topic)
- Section 7.1
Sections 7.2 and 7.3 concern means and numerical data, estimating the average or mean.
The primary difference between sections 7.2 and 7.3 will be the availability of the population standard deviation.

We didn't complete Section 7.2, specifically how to calculate the minimum sample size.
Formula is based on the margin of error:
$Margin \space of \space Error = z * (Standard \space Deviation / \sqrt{n})$
We solve for $n$ in this equation.
The formula includes a z-score, determined by the desired confidence level.
For 90% confidence:
- Sampling distribution is considered.
- 90% lies in the middle, bounded by lower and upper limits.
- In terms of x, these are lower and upper bounds; in terms of z, negative and positive z values for standard normal distribution.

Use inverse norms to find Z-scores.
For 90% confidence:
- 90% inside, remainder outside.
- $100 - 90 = 10$
- 10% is outside, split in half.
- $10 / 2 = 5$
- 5% on each tail.
Negative z-score calculation:
- $Inverse \space Norm(left \space tail, mean, standard \space deviation)$
- $Inverse \space Norm(0.05, 0, 1)$
Result: z-score = -1.645.
A table provides pre-calculated z-scores for common confidence levels.
For 90% confidence, the z-score of 1.645 is readily available on the table.
Using the table is generally easier than calculating inverse norms.

Formula:
$n = (z * Standard \space Deviation / Margin \space of \space Error)^2$
Aim is to determine the necessary amount of data to collect.
Example: Estimating the amount of money people spend at a store.
Margin of error:
- Indicates desired accuracy.
- Keyword: "within".
- Example: "Accurate to within $1" implies margin of error is 1.

Sample of 68 corridors, mean of 61.2 decibels, population standard deviation of 9.7.
Construct a 95% confidence interval.
This is a confidence interval for the mean.
Differentiating Z and T Interval
- Z-interval (7.2 question): population standard deviation is known.
- T-interval (7.3 question): will be discussed later.
Since this problem provides population standard deviation, use a z-interval.

Using calculator, go to Stat, then Tests, and choose option 7 (Z-interval).
Input:
- Summary Stats
- $Population \space Standard \space Deviation = 9.7$
- $Mean = 61.2$
- $Sample \space Size = 68$
- $Confidence \space Level = 95$
Enter the values into calculator. And the Results are:
- Lower bond: 58.9
- Upper bond: 63.5

Enter raw data into a list (Stat, Edit, option 1).
Go to Stat, Tests, option 7 (Z-interval).
Select "Data" instead of "Stats."
Input population standard deviation (e.g., 2.1).
Specify the list where data is entered (e.g., L2).
Set frequency to 1.
Choose confidence level (e.g., 99%).
Calculate. You get the results.
Example interpretation: First-time married couples stay together between roughly 5.4 and 7.4 years.

Focus: Constructing confidence intervals.
No minimum sample size calculations.
Two methods:
- Formula
- Calculator function
Calculator function is similar to 7.2, but uses a different function
- Rationale: Lack of population standard deviation.
- Instead, use sample standard deviation.
- Creates more uncertainty because sample standard deviation is a guess of population value.

Incorporates added uncertainty.
Formula: $x \space bar +- t * (s/ \sqrt{n})$ * Instead of population standard deviation, use the sample standard deviation.
- Instead of Z-score, use the t-score.

Provides t-scores.
Example: 95% confidence interval, sample size of 20.
Degrees of freedom = n - 1 (loss of one degree of freedom due to the guess involved).
- $20 - 1 = 19$
Table usage:
- Locate the 95% under confidence level column
- Find the intersection with 19 degrees of freedom.
- T-score: Approximately 2.093.

Function: inverse t.
- Input the left tail, 0.025, and the degrees of freedom, 19.

Encouraging the use of the t-interval
T Distribution: Bell-Shaped, Symmetric, Centered at Zero
Calculator More Accurate Than Table Because It Uses the Exact Value Of T Score Compared to Estimations With Table
T Interval is Fast and Accurate, Whereas Formula Is Slower