Confidence Intervals and Hypothesis Testing for a Mean

Confidence Intervals for a Mean

General form: statistic \pm (critical value * standard error).
In this case: Sample Mean \pm (t-star * standard error).

Standard Error of the Sample Mean

Formula:
\text{Standard Error} = \frac{\text{Population Standard Deviation}}{\sqrt{\text{Sample Size}}} \
Problem: Population standard deviation is usually unknown.
Solution: Replace population standard deviation with sample standard deviation (s).

The Problem with Using Sample Standard Deviation

Using the sample standard deviation introduces an extra level of uncertainty.
This is because we are now using two estimations: the sample mean and the sample standard deviation.
Due to this extra uncertainty, we cannot use the z distribution.

The t Distribution

Similar to the z distribution (symmetrical, bell-shaped).
Has heavier/thicker tails (more area in the tails, less in the middle).
More spread out than the z distribution.
Characterized by degrees of freedom.

Degrees of Freedom

Degrees of freedom are related to sample size.
Formula: degrees of freedom = n - 1 (where n is the sample size).
As sample size increases, the t distribution approaches the z distribution.

William Sealy Gosset

Head brewmaster for Guinness.
Developed the t distribution but published under the pseudonym "Student" due to Guinness's restrictions.
The t distribution is sometimes called the Student's t distribution.

Conditions for Using the t Distribution

Must have either a large sample size (at least 30) or data from an approximately normal distribution.
If the sample size is small (n < 30), must have a good reason to assume the population is normally distributed.
Small samples from skewed populations may not accurately represent the population.
Even samples from normal populations can have unusual shapes when the sample size is small.

Calculating a Confidence Interval for a Mean

Formula: Sample Mean \pm (t-star * Standard Error).

Example: Gribble Lengths

Context: Estimating the average length of gribbles (small critters potentially useful in biofuel development).
Sample: 50 gribbles.
Sample mean: 3.1 mm.
Sample standard deviation: 0.72 mm.

Steps to Calculate the Confidence Interval

Identify the sample statistics:
- Sample mean = 3.1
- Sample size = 50
- Sample standard deviation = 0.72
Calculate the degrees of freedom: n - 1 = 50 - 1 = 49.
Determine the t-star value using StatKey or a t-table.
- For a 90% confidence interval with 49 degrees of freedom, t-star ≈ 1.677.
Calculate the standard error:
\text{Standard Error} = \frac{0.72}{\sqrt{50}}
Calculate the margin of error: t-star * standard error.
Calculate the lower and upper limits of the confidence interval:
- Lower limit: sample mean - margin of error.
- Upper limit: sample mean + margin of error.
Round the final values to two decimal places.

Excel for Calculations

Using Excel is fine as long as the correct answer is obtained.
Excel allows for easy checking and modification of calculations.

Interpretation

We can be 90% confident that the average length of all gribbles is within the calculated interval (e.g., 2.93 mm to 3.27 mm).
This is not about 90% of the data or 90% of gribbles; it's about the confidence in the interval capturing the true population mean.

Determining Sample Size

Goal: Determine how much data is needed to achieve a desired level of accuracy and confidence.

Margin of Error Formula

\text{Margin of Error} = t^* \times \frac{s}{\sqrt{n}}

Rearrange to solve for n (sample size):
n = (\frac{t^* \times s}{ME})^2
Where:
t^* is the t-critical value
s is the sample standard deviation
ME is the desired margin of error

Challenges

Estimating the sample standard deviation (s) before collecting data.
Determining the t-star value without knowing the sample size (degrees of freedom).

Solutions

Replace t-star with z-star (a good approximation).
Estimate the sample standard deviation:
- Use data from previous research or similar studies.
- Use the range rule of thumb: (Maximum Value - Minimum Value) / 4.

Hypothesis Testing for a Mean

Use a hypothesis test to determine if there is evidence that a population mean is:
- Less than a specific value.
- More than a specific value.
- Different from a specific value.

Null and Alternative Hypotheses

Null Hypothesis (H_0): The population mean is equal to some hypothesized value.
Alternative Hypothesis (H_a): The population mean is less than, greater than, or not equal to the hypothesized value.

Standardized Test Statistic (t-score)

Formula:
t = \frac{\text{Sample Mean} - \text{Hypothesized Mean}}{\text{Standard Error}}
It is effectively just a t score for the statistic.
It tells how many standard errors the statistic is away from the null hypothesized value.

Criteria Conditions

Same as for confidence intervals:
- Large sample size (n ≥ 30).
- Or a good reason to assume the population is approximately normal.

P-value

Use the t distribution with n - 1 degrees of freedom to calculate the p-value.

Example: Social Media Usage

Context: Determining if stat one zero one students spend more than three hours per day on social media.

Data:

Sample size: 202 students (n = 202).
Sample mean: 3.53 hours (x̄ = 3.53).
Sample standard deviation: 1.97 hours (s = 1.97).

Hypotheses:

Null hypothesis (H_0): The population mean is equal to three hours. (\mu = 3)
Alternative hypothesis (H_a): The population mean is greater than three hours. (\mu > 3)

Significance Level:

Let's use alpha = 0.01 (1% significance level).

Test Statistic:

Calculate the t-score:
t = \frac{3.53 - 3}{\frac{1.97}{\sqrt{202}}}

Excel Calculations:

Standard error: sample standard deviation / sqrt(sample size).
t-score: (sample mean - hypothesized mean) / standard error.
Ensure correct order of operations using parentheses.

P-value Calculation:

Use StatKey or a t-table to find the p-value.
Degrees of freedom: n - 1 = 201.
One-tailed test (since we're only interested in values greater than three).
In this example, the p-value is very small (approximately 0.0001).

Decision and Conclusion:

Decision rule: If the p-value is less than alpha, reject the null hypothesis.
In this case, 0.0001 < 0.01, so we reject the null hypothesis.
Conclusion: There is strong evidence that the average amount of time spent on social media by stat one zero one students is more than three hours per day.