Statistics - Confidence Intervals

Confidence Interval Overview

  • Confidence intervals provide a range of plausible values for an unknown population parameter, on which we place a high degree of confidence that it includes the true value.
  • They expand point estimates to interval estimates, offering a range for our estimates of a parameter, typically the mean (BC).

Confidence Interval for the Population Mean

Large Sample Size (n >= 30)

  • Central Limit Theorem: For large sample sizes, the sampling distribution of the sample mean ($ar{x}$) is approximately normal.
  • The formula for a 95% confidence interval for the mean is: ar{x} ext{ } ext{±} Z_{0.025} rac{s}{ ext{√}n}, where:
    • Z_{0.025} is the critical value from the z-distribution (1.96 for 95% confidence).
    • s is the sample standard deviation.
Example Calculation
  • Sample of 80 shops:
    • Mean cost of repair ($ar{x}$) = $472.36
    • Standard deviation ($s$) = $62.35
  • Confidence interval:
    472.36 ext{ ± } 1.96 rac{62.35}{ ext{√}80}
    ightarrow [458.7, 486.02]
  • Interpretation: We are 95% confident that the true mean cost is within this interval.

Understanding Confidence

  • Misinterpretation: It is incorrect to say there's a 95% probability that the true mean falls within the interval. The parameter (BC) is fixed, not random.
  • Randomness applies to the sampling process that generates different intervals.

Calculating Confidence Levels

  • For intervals derived from independent samples, about 95% will contain the true mean ($BC$).
  • For confidence levels:
    • 90% confidence: Critical value = 1.645
    • 99% confidence: Critical value = 2.575

Precision in Confidence Intervals

  • Factors affecting interval width:
    1. Sample Standard Deviation (s): Larger values widen the interval.
    2. Confidence Level: Higher confidence levels widen the interval (Z alpha increases).
    3. Sample Size (n): Larger sample sizes narrow the interval. Width is inversely proportional to $ ext{√}n$.

Sample Size Determination

  • To achieve a specific maximum error ($W$) in a 95% confidence interval:
    n = rac{(Z_{0.025} s)}{W}^2

Smaller Sample Size (n < 30)

  • If the sample size is small and the population distribution is normal:
  • Use the t-distribution instead of the normal distribution:
    • The t-distribution is similar but has heavier tails, accommodating more variability.
  • Confidence interval formula adjusts:
    ar{x} ext{ } ext{±} t_{n-1} rac{s}{ ext{√}n}
  • Where t_{n-1} is the critical t-value with (n-1) degrees of freedom.

Example with t-Distribution

  • Sample mean ($ar{x}$) = 61,492, sample standard deviation (s) = 3,035, sample size (n) = 10:
  • Degrees of freedom (df) = 9.
  • Critical value for 95% confidence ($t_{0.025}$, 9 df) = 2.262.
  • Confidence interval calculation:
    61,492 ext{ } ext{±} 2.262 rac{3,035}{ ext{√}10}
    ightarrow [59,321.04, 63,662.96]

One-Sided Confidence Intervals

  • For one-sided bounds, replace Z{0.025} with Z{ ext{α}} in:
    • Lower confidence bound: ar{x} - Z_{ ext{α}} rac{s}{ ext{√}n}
    • Upper confidence bound: ar{x} + Z_{ ext{α}} rac{s}{ ext{√}n}

Confidence Interval for Population Proportion

Large Samples

  • Proportions can also be estimated using a similar formula:
    • 95% confidence interval for proportion ($ ext{p}$):
      ext{p̂} ext{±} 1.96 ext{√} rac{ ext{p̂}(1- ext{p̂})}{n}

Adjusted Confidence Interval

  • If sample size n is small, use:
    • Adjusted sample size: n + 4 (add 2 successes and 2 failures).
    • Adjusted sample proportion: ext{p̄} = (x+2)/n^ ext{adjusted}

Example for Proportion

For a sample of 10 cracked tiles out of 125, adjusted sample proportion:

  • Adjusted sample size = 129, adjusted proportion = 0.114
  • 99% confidence interval:
    0.114 ext{±} 2.575 ext{√} rac{0.114(1-0.114)}{129}

Summary Conclusion

  • Understanding confidence intervals is crucial for making inferential statistics about population parameters.
  • Different approaches apply based on sample size and distribution characteristics.
  • Common pitfalls in interpretation should always be avoided for accurate statistical reporting.