Statistical Intervals Based on a Single Sample

Basic Properties of Confidence Intervals

  • Introduction: Focus on estimating a population mean (\mu). Assumptions:
    • Population distribution is normal.
    • Population standard deviation \sigma is known (unrealistic in practice).
  • Sample Observations: Observations x1, x2, …, xn are from a random sample X1, …, X_n from a normal distribution with mean \mu and standard deviation \sigma.
  • Distribution of Sample Mean: The sample mean \bar{X} is normally distributed with:
    • Expected value: E(\bar{X}) = \mu
    • Standard deviation: (\frac{\sigma}{\sqrt{n}})
  • Standardizing: Standardizing \bar{X} yields a standard normal variable:
    Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}
  • Standard Normal Curve: The area under the standard normal curve between -1.96 and 1.96 is 0.95.
    P(-1.96 < Z < 1.96) = 0.95
    P(-1.96 < \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} < 1.96) = 0.95
  • Manipulating Inequalities:
    • Multiply by (\frac{\sigma}{\sqrt{n}}):
      -1.96 \frac{\sigma}{\sqrt{n}} < \bar{X} - \mu < 1.96 \frac{\sigma}{\sqrt{n}}
    • Subtract \bar{X} from each term:
      -\bar{X} - 1.96 \frac{\sigma}{\sqrt{n}} < -\mu < -\bar{X} + 1.96 \frac{\sigma}{\sqrt{n}}
    • Multiply by -1 (reverses inequality direction):
      \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}} > \mu > \bar{X} - 1.96 \frac{\sigma}{\sqrt{n}}
      \bar{X} - 1.96 \frac{\sigma}{\sqrt{n}} < \mu < \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}}
  • Random Interval: The probability statement can be written as:
    P(\bar{X} - 1.96 \frac{\sigma}{\sqrt{n}} < \mu < \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}}) = 0.95
  • Interpretation: Think of a random interval with endpoints \bar{X} - 1.96 \frac{\sigma}{\sqrt{n}} and \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}}. In interval notation:
    (\bar{X} - 1.96 \frac{\sigma}{\sqrt{n}}, \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}})
  • Properties of the Interval:
    • The interval is random because its endpoints depend on the random variable \bar{X}.
    • It is centered at the sample mean \bar{X}.
    • It extends 1.96 \frac{\sigma}{\sqrt{n}} to each side of \bar{X}.
    • The width of the interval is 2(1.96) \frac{\sigma}{\sqrt{n}}, which is not random; only the location is random.
  • Probability Statement Paraphrased: "The probability is 0.95 that the random interval includes the true value of \mu."

Definition of Confidence Interval

  • Prior to Experiment: Before data collection, it is likely that \mu will lie inside the interval.

Example 7.2

  • Values: Given \sigma = 2.0, n = 31, and \bar{x} = 80.0, the 95% confidence interval (CI) for the true average preferred height is:
    (80.0 - 1.96 \frac{2.0}{\sqrt{31}}, 80.0 + 1.96 \frac{2.0}{\sqrt{31}}) = (79.3, 80.7)
  • Interpretation: We can be highly confident (95% confidence level) that 79.3 < \mu < 80.7. This narrow interval indicates a precise estimation of \mu.

Interpreting a Confidence Level

  • Inheritance: The 95% confidence level is inherited from the 0.95 probability of the random interval capturing the true value of \mu.
  • Incorrect Conclusion: It is tempting to conclude that \mu is within the fixed interval with a probability of 0.95, but this is incorrect after substituting \bar{x} = 80.0 because randomness disappears.
  • Correct Interpretation: Relies on the long-run relative frequency interpretation of probability.
    • Saying event A has probability 0.95 means if the experiment is repeated many times, A will occur 95% of the time.
  • Repeated Sampling: Suppose we obtain multiple independent samples of typists’ preferred heights and compute a 95% CI for each sample.
    • If A is the event that \bar{X} - 1.96 \frac{\sigma}{\sqrt{n}} < \mu < \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}}, then P(A) = 0.95.
    • In the long run, 95% of the computed CIs will contain \mu.
  • Example: In a scenario with 100 intervals, approximately 95 of them would contain \mu.
  • Focus on Long-Run: The 95% confidence level isn't a statement about a specific interval like (79.3, 80.7) but about what happens if many similar intervals are constructed.
  • Classical CI: These intervals are "classical" because their interpretation relies on the classical notion of probability.

Other Levels of Confidence

  • Adjusting for Desired Confidence: Any desired confidence level can be achieved by replacing 1.96 (for 95% CI) with the appropriate standard normal critical value (z-score).
  • Generalization: A probability of 1 - \alpha is achieved by using z_{\alpha/2} in place of 1.96.
  • CI Formula (7.5): The general formula for a CI is:
    \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}
    Which can be expressed as:
    Point estimate of \mu \pm (z critical value) * (standard error of the mean)

Example 7.3

  • Context: A production process modification for engine control housing units, where hole diameters are normally distributed.

    • Prior standard deviation: \sigma = 0.100 mm (assumed unchanged).
    • Sample of n = 40 housing units, with mean diameter \bar{x} = 5.426 mm.
  • Goal: Calculate a 90% confidence interval for the true average hole diameter.

  • Calculation: To obtain confidence level of 90%, so 100(1 - \alpha) = 90, thus \alpha = 0.10 and z{\alpha/2} = z{0.05} = 1.645.

    5.426 \pm 1.645 \frac{0.100}{\sqrt{40}} = (5.400, 5.452)

  • Conclusion: With high confidence (90%), we can say 5.400 < \mu < 5.452. The interval is narrow due to small variability in hole diameter.

Confidence Level, Precision, and Sample Size

  • Trade-off: Higher confidence levels result in wider intervals, meaning reduced precision.
  • Width Comparison:
    • 95% interval width: 2(1.96) \frac{\sigma}{\sqrt{n}} = 3.92 \frac{\sigma}{\sqrt{n}}
    • 99% interval width: 2(2.58) \frac{\sigma}{\sqrt{n}} = 5.16 \frac{\sigma}{\sqrt{n}}
  • Inverse Relationship: Confidence level (reliability) is inversely related to precision.
  • Strategy: Specify desired confidence level and interval width, then determine the necessary sample size.

Example 7.4

  • Context: Estimating true average response time \mu for a computer time-sharing system after a new operating system installation. Response times are normally distributed with \sigma = 25 millisec.
  • Goal: Determine sample size n to ensure a 95% CI with a width of at most 10.
  • Calculation:
    2 z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = w
    where w is the desired width, in this case 10. So,
    10 = 2 (1.96) \frac{25}{\sqrt{n}}
    \sqrt{n} = \frac{2 (1.96)(25)}{10} = 9.80
    n = (9.80)^2 = 96.04
  • Sample Size: Round up to the nearest integer, so n = 97 is required.

General Formula for Sample Size

  • Formula: To ensure an interval width w:
    n = (\frac{2 z_{\alpha/2} \sigma}{w})^2
  • Dependence: n increases as:
    • w decreases (smaller width requires larger sample size).
    • \sigma increases (more population variability requires larger sample size).
    • 100(1 - \alpha)\uparrow (as \alpha decreases, z_{\alpha/2} increases for higher confidence level)

Bound on the Error of Estimation

  • Definition: The half-width of the 95% CI (1.96 \frac{\sigma}{\sqrt{n}}) is the bound on the error of estimation.
  • Interpretation: With 95% confidence, the point estimate \bar{x} will be no farther than this bound from \mu.
  • Objective: Determine a sample size for which a particular value of the bound is achieved.

Estimating Mean to Within a Bound

  • Generalization: To estimate \mu to within an amount B with 100(1 - \alpha)\% confidence, the necessary sample size is found by replacing 2/w with 1/B in the sample size formula.
  • Sample size formula: n = (z_{\alpha/2} \frac{\sigma}{B})^2

Deriving a Confidence Interval

  • Objective: Construct a CI for a parameter \theta based on a sample X1, X2, …, X_n.

  • Requirements of Random Variable: Find a random variable h(X1, X2, …, X_n; \theta) that:

    • Depends functionally on both the sample and \theta.
    • Has a probability distribution that does not depend on \theta or any other unknown parameters.
  • Normal Example: If the population is normal with known \sigma and \theta = \mu, then
    h(X1, …, Xn; \mu) = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}
    satisfies both properties (standard normal distribution).

  • Appropriate Estimator: Distribution of the h function is usually suggested by examining the distribution of an appropriate estimator.

  • Probability Statement: For any \alpha between 0 and 1, find constants a and b such that
    P(a < h(X1, …, Xn; \theta) < b) = 1 - \alpha
    Where a and be do not depend on q.

  • Manipulation: Manipulate the inequalities to isolate \theta:
    P(l(X1, X2, …, Xn) < \theta < u(X1, X2, …, Xn)) = 1 - \alpha

  • Confidence Limits: l(x1, x2, …, xn) and u(x1, …, x_n) are the lower and upper confidence limits, respectively, for a 100(1 - \alpha)\% CI.

Example 7.5

  • Context: Time to breakdown of an insulating fluid between electrodes follows an exponential distribution with parameter \lambda.

  • Sample: a sample of n = 10 gives x_i values.

  • Goal: Find a 95% CI for \lambda and for the true average breakdown time.

  • Random Variable: Let h(X1, X2, …, Xn; \lambda) = 2\lambda \sum{Xi}. This has a chi-squared distribution with 2n degrees of freedom (df).

  • Chi-Squared Distribution: Denoted by v = 2n, where v is the parameter. For example for n=10 we have v=20.

  • Using Chi-Squared Table: With v = 20, 34.170 captures upper-tail area 0.025, and 9.591 captures lower-tail area 0.025.

  • Probability: For n = 10: \P(9.591 < 2\lambda \sum X_i < 34.170) = 0.95

  • Isolating Lambda: Division by 2\sum Xi isolates \lambda: P(\frac{9.591}{2\sum Xi} < \lambda < \frac{34.170}{2\sum X_i}) = 0.95

  • Confidence Limits: The lower limit is \frac{9.591}{2\sum Xi}, and an upper limit is \frac{34.170}{2\sum Xi}.

  • Calculation: For given data, \sum x_i = 550.87, giving the interval (0.00871, 0.03101).

  • Expected Value: The expected value of an exponential rv is m = \frac{1}{\lambda}. So
    P(\frac{2\sum Xi }{34.170} < \frac{1}{\lambda} < \frac{2\sum Xi}{9.591} = 0.95

  • CI for Meam: For true average breakdown time is (2\sum xi /34.170, 2\sum xi /9.591) = (32.24, 114.87) . Interval is wide due to variability and small sample size.

Bootstrap Confidence Intervals

  • Bootstrap CI for Theta: Can be applied to obtain a CI for \theta.
  • Estimating Mean Example: Consider again estimating the mean \mu of a normal distribution when \sigma is known.
  • Percentile: 1.96 \frac{\sigma}{\sqrt{n}} is the 97.5th percentile of the distribution of \bar{X} - \mu$
    P(\bar{X} - \mu < 1.96 \frac{\sigma}{\sqrt{n}}) = P(Z < 1.96) = 0.975
  • Symmetry: Similarly, -1.96 \frac{\sigma}{\sqrt{n}} is the 2.5th percentile.
  • Interval:
    0.95 = P(2.5th \text{ percentile } < \bar{X} - \mu < 97.5th \text{ percentile }) = P( - 2.5th \text{ percentile } > \mu - \bar{X} > - 97.5th \text{ percentile })
  • Bootstrap CI: With
    • l = \bar{X} - 97.5th \text{ percentile of } \bar{X} - \mu
    • u = \bar{X} - 2.5th \text{ percentile of } \bar{X} - \mu
      the CI for \mu is (l, u).
  • Bootstrap Samples: Percentiles can be estimated from bootstrap samples.
  • Procedure: If have B = 1000 bootstrap samples, calculate \bar{X}^ and and the differences \bar{X}^ - \bar{X}$$
  • Estimate for Percentiles: 25th largest and 25th smallest of these differences are estimates of the 97.5th and 2.5th percentiles.