Confidence Intervals and Hypothesis Testing for a Mean

Confidence Intervals for a Mean

  • General form: statistic \pm (critical value * standard error).
  • In this case: Sample Mean \pm (t-star * standard error).

Standard Error of the Sample Mean

  • Formula:
    \text{Standard Error} = \frac{\text{Population Standard Deviation}}{\sqrt{\text{Sample Size}}} \
  • Problem: Population standard deviation is usually unknown.
  • Solution: Replace population standard deviation with sample standard deviation (s).

The Problem with Using Sample Standard Deviation

  • Using the sample standard deviation introduces an extra level of uncertainty.
  • This is because we are now using two estimations: the sample mean and the sample standard deviation.
  • Due to this extra uncertainty, we cannot use the z distribution.

The t Distribution

  • Similar to the z distribution (symmetrical, bell-shaped).
  • Has heavier/thicker tails (more area in the tails, less in the middle).
  • More spread out than the z distribution.
  • Characterized by degrees of freedom.

Degrees of Freedom

  • Degrees of freedom are related to sample size.
  • Formula: degrees of freedom = n - 1 (where n is the sample size).
  • As sample size increases, the t distribution approaches the z distribution.

William Sealy Gosset

  • Head brewmaster for Guinness.
  • Developed the t distribution but published under the pseudonym "Student" due to Guinness's restrictions.
  • The t distribution is sometimes called the Student's t distribution.

Conditions for Using the t Distribution

  • Must have either a large sample size (at least 30) or data from an approximately normal distribution.
  • If the sample size is small (n < 30), must have a good reason to assume the population is normally distributed.
  • Small samples from skewed populations may not accurately represent the population.
  • Even samples from normal populations can have unusual shapes when the sample size is small.

Calculating a Confidence Interval for a Mean

  • Formula: Sample Mean \pm (t-star * Standard Error).

Example: Gribble Lengths

  • Context: Estimating the average length of gribbles (small critters potentially useful in biofuel development).
  • Sample: 50 gribbles.
  • Sample mean: 3.1 mm.
  • Sample standard deviation: 0.72 mm.

Steps to Calculate the Confidence Interval

  1. Identify the sample statistics:
    • Sample mean = 3.1
    • Sample size = 50
    • Sample standard deviation = 0.72
  2. Calculate the degrees of freedom: n - 1 = 50 - 1 = 49.
  3. Determine the t-star value using StatKey or a t-table.
    • For a 90% confidence interval with 49 degrees of freedom, t-star ≈ 1.677.
  4. Calculate the standard error:
    \text{Standard Error} = \frac{0.72}{\sqrt{50}}
  5. Calculate the margin of error: t-star * standard error.
  6. Calculate the lower and upper limits of the confidence interval:
    • Lower limit: sample mean - margin of error.
    • Upper limit: sample mean + margin of error.
  7. Round the final values to two decimal places.

Excel for Calculations

  • Using Excel is fine as long as the correct answer is obtained.
  • Excel allows for easy checking and modification of calculations.

Interpretation

  • We can be 90% confident that the average length of all gribbles is within the calculated interval (e.g., 2.93 mm to 3.27 mm).
  • This is not about 90% of the data or 90% of gribbles; it's about the confidence in the interval capturing the true population mean.

Determining Sample Size

  • Goal: Determine how much data is needed to achieve a desired level of accuracy and confidence.

Margin of Error Formula

\text{Margin of Error} = t^* \times \frac{s}{\sqrt{n}}

  • Rearrange to solve for n (sample size):
    n = (\frac{t^* \times s}{ME})^2
    Where:
  • t^* is the t-critical value
  • s is the sample standard deviation
  • ME is the desired margin of error

Challenges

  • Estimating the sample standard deviation (s) before collecting data.
  • Determining the t-star value without knowing the sample size (degrees of freedom).

Solutions

  1. Replace t-star with z-star (a good approximation).
  2. Estimate the sample standard deviation:
    • Use data from previous research or similar studies.
    • Use the range rule of thumb: (Maximum Value - Minimum Value) / 4.

Hypothesis Testing for a Mean

  • Use a hypothesis test to determine if there is evidence that a population mean is:
    • Less than a specific value.
    • More than a specific value.
    • Different from a specific value.

Null and Alternative Hypotheses

  • Null Hypothesis (H_0): The population mean is equal to some hypothesized value.
  • Alternative Hypothesis (H_a): The population mean is less than, greater than, or not equal to the hypothesized value.

Standardized Test Statistic (t-score)

  • Formula:
    t = \frac{\text{Sample Mean} - \text{Hypothesized Mean}}{\text{Standard Error}}
  • It is effectively just a t score for the statistic.
  • It tells how many standard errors the statistic is away from the null hypothesized value.

Criteria Conditions

  • Same as for confidence intervals:
    • Large sample size (n ≥ 30).
    • Or a good reason to assume the population is approximately normal.

P-value

  • Use the t distribution with n - 1 degrees of freedom to calculate the p-value.

Example: Social Media Usage

  • Context: Determining if stat one zero one students spend more than three hours per day on social media.

Data:

  • Sample size: 202 students (n = 202).
  • Sample mean: 3.53 hours (x̄ = 3.53).
  • Sample standard deviation: 1.97 hours (s = 1.97).

Hypotheses:

  • Null hypothesis (H_0): The population mean is equal to three hours. (\mu = 3)
  • Alternative hypothesis (H_a): The population mean is greater than three hours. (\mu > 3)

Significance Level:

  • Let's use alpha = 0.01 (1% significance level).

Test Statistic:

  • Calculate the t-score:
    t = \frac{3.53 - 3}{\frac{1.97}{\sqrt{202}}}

Excel Calculations:

  • Standard error: sample standard deviation / sqrt(sample size).
  • t-score: (sample mean - hypothesized mean) / standard error.
  • Ensure correct order of operations using parentheses.

P-value Calculation:

  • Use StatKey or a t-table to find the p-value.
  • Degrees of freedom: n - 1 = 201.
  • One-tailed test (since we're only interested in values greater than three).
  • In this example, the p-value is very small (approximately 0.0001).

Decision and Conclusion:

  • Decision rule: If the p-value is less than alpha, reject the null hypothesis.
  • In this case, 0.0001 < 0.01, so we reject the null hypothesis.
  • Conclusion: There is strong evidence that the average amount of time spent on social media by stat one zero one students is more than three hours per day.