Confidence Intervals and Hypothesis Testing for a Mean
Confidence Intervals for a Mean
- General form: statistic \pm (critical value * standard error).
- In this case: Sample Mean \pm (t-star * standard error).
Standard Error of the Sample Mean
- Formula:
\text{Standard Error} = \frac{\text{Population Standard Deviation}}{\sqrt{\text{Sample Size}}} \ - Problem: Population standard deviation is usually unknown.
- Solution: Replace population standard deviation with sample standard deviation (s).
The Problem with Using Sample Standard Deviation
- Using the sample standard deviation introduces an extra level of uncertainty.
- This is because we are now using two estimations: the sample mean and the sample standard deviation.
- Due to this extra uncertainty, we cannot use the z distribution.
The t Distribution
- Similar to the z distribution (symmetrical, bell-shaped).
- Has heavier/thicker tails (more area in the tails, less in the middle).
- More spread out than the z distribution.
- Characterized by degrees of freedom.
Degrees of Freedom
- Degrees of freedom are related to sample size.
- Formula: degrees of freedom = n - 1 (where n is the sample size).
- As sample size increases, the t distribution approaches the z distribution.
William Sealy Gosset
- Head brewmaster for Guinness.
- Developed the t distribution but published under the pseudonym "Student" due to Guinness's restrictions.
- The t distribution is sometimes called the Student's t distribution.
Conditions for Using the t Distribution
- Must have either a large sample size (at least 30) or data from an approximately normal distribution.
- If the sample size is small (n < 30), must have a good reason to assume the population is normally distributed.
- Small samples from skewed populations may not accurately represent the population.
- Even samples from normal populations can have unusual shapes when the sample size is small.
Calculating a Confidence Interval for a Mean
- Formula: Sample Mean \pm (t-star * Standard Error).
Example: Gribble Lengths
- Context: Estimating the average length of gribbles (small critters potentially useful in biofuel development).
- Sample: 50 gribbles.
- Sample mean: 3.1 mm.
- Sample standard deviation: 0.72 mm.
Steps to Calculate the Confidence Interval
- Identify the sample statistics:
- Sample mean = 3.1
- Sample size = 50
- Sample standard deviation = 0.72
- Calculate the degrees of freedom: n - 1 = 50 - 1 = 49.
- Determine the t-star value using StatKey or a t-table.
- For a 90% confidence interval with 49 degrees of freedom, t-star ≈ 1.677.
- Calculate the standard error:
\text{Standard Error} = \frac{0.72}{\sqrt{50}} - Calculate the margin of error: t-star * standard error.
- Calculate the lower and upper limits of the confidence interval:
- Lower limit: sample mean - margin of error.
- Upper limit: sample mean + margin of error.
- Round the final values to two decimal places.
Excel for Calculations
- Using Excel is fine as long as the correct answer is obtained.
- Excel allows for easy checking and modification of calculations.
Interpretation
- We can be 90% confident that the average length of all gribbles is within the calculated interval (e.g., 2.93 mm to 3.27 mm).
- This is not about 90% of the data or 90% of gribbles; it's about the confidence in the interval capturing the true population mean.
Determining Sample Size
- Goal: Determine how much data is needed to achieve a desired level of accuracy and confidence.
Margin of Error Formula
\text{Margin of Error} = t^* \times \frac{s}{\sqrt{n}}
- Rearrange to solve for n (sample size):
n = (\frac{t^* \times s}{ME})^2
Where: - t^* is the t-critical value
- s is the sample standard deviation
- ME is the desired margin of error
Challenges
- Estimating the sample standard deviation (s) before collecting data.
- Determining the t-star value without knowing the sample size (degrees of freedom).
Solutions
- Replace t-star with z-star (a good approximation).
- Estimate the sample standard deviation:
- Use data from previous research or similar studies.
- Use the range rule of thumb: (Maximum Value - Minimum Value) / 4.
Hypothesis Testing for a Mean
- Use a hypothesis test to determine if there is evidence that a population mean is:
- Less than a specific value.
- More than a specific value.
- Different from a specific value.
Null and Alternative Hypotheses
- Null Hypothesis (H_0): The population mean is equal to some hypothesized value.
- Alternative Hypothesis (H_a): The population mean is less than, greater than, or not equal to the hypothesized value.
Standardized Test Statistic (t-score)
- Formula:
t = \frac{\text{Sample Mean} - \text{Hypothesized Mean}}{\text{Standard Error}} - It is effectively just a t score for the statistic.
- It tells how many standard errors the statistic is away from the null hypothesized value.
Criteria Conditions
- Same as for confidence intervals:
- Large sample size (n ≥ 30).
- Or a good reason to assume the population is approximately normal.
P-value
- Use the t distribution with n - 1 degrees of freedom to calculate the p-value.
Example: Social Media Usage
- Context: Determining if stat one zero one students spend more than three hours per day on social media.
Data:
- Sample size: 202 students (n = 202).
- Sample mean: 3.53 hours (x̄ = 3.53).
- Sample standard deviation: 1.97 hours (s = 1.97).
Hypotheses:
- Null hypothesis (H_0): The population mean is equal to three hours. (\mu = 3)
- Alternative hypothesis (H_a): The population mean is greater than three hours. (\mu > 3)
Significance Level:
- Let's use alpha = 0.01 (1% significance level).
Test Statistic:
- Calculate the t-score:
t = \frac{3.53 - 3}{\frac{1.97}{\sqrt{202}}}
Excel Calculations:
- Standard error: sample standard deviation / sqrt(sample size).
- t-score: (sample mean - hypothesized mean) / standard error.
- Ensure correct order of operations using parentheses.
P-value Calculation:
- Use StatKey or a t-table to find the p-value.
- Degrees of freedom: n - 1 = 201.
- One-tailed test (since we're only interested in values greater than three).
- In this example, the p-value is very small (approximately 0.0001).
Decision and Conclusion:
- Decision rule: If the p-value is less than alpha, reject the null hypothesis.
- In this case, 0.0001 < 0.01, so we reject the null hypothesis.
- Conclusion: There is strong evidence that the average amount of time spent on social media by stat one zero one students is more than three hours per day.