GES 500 Engineering Statistics: Confidence Intervals

7.1. Interval estimation: Confidence Level and Confidence Interval

Confidence Level and Confidence Interval
- For a variable with a known distribution, we define the confidence level and confidence interval.
- Given a random variable X with a known cumulative distribution function (CDF) F(x) and parameter \alpha where 0 \leq \alpha \leq 1.
- We want to find an interval [a, b] where the probability of X falling within this interval is 1 - \alpha, expressed as:
  P(a \leq X \leq b) = 1 - \alpha.
- This interval is called the confidence interval corresponding to the confidence level A, where A = 100(1 - \alpha)\%.
- Typically, small \alpha values (e.g., \alpha \approx 0.01) and high A values (e.g., A \approx 100\%) are sought, implying most values of X can be found in the interval.
Solving Confidence Interval Initial Setup
- The solution of the probability equation P(a \leq X \leq b) = 1 - \alpha is non-unique without knowing how to distribute remaining probabilities.
- For bounds P(a \leq X \leq b) = F(b) - F(a), we consider intervals [a, b] and [c, d] corresponding to the same confidence level.
Finding Boundaries of Confidence Interval
- To compute confidence interval boundaries, solve equations:
  P(a \leq X) = F(a) = P, \quad P(b \geq X) = F(b) = P.
- To find a unique solution, symmetry can be introduced with P(a) = \frac{\alpha}{2} and P(b) = 1 - \frac{\alpha}{2}, valid mainly for symmetric distributions like the normal distribution.
- For non-symmetrical distributions, choosing P(a) and P(b) is less straightforward.
Cumulative Distribution Function Requirement
- Effective interval estimation relies on knowing the cumulative distribution function (CDF), marking it as distinct from point estimation, which is not dependent on the distribution's form.
- An example using a normal variable, where we possess a known mean \mu and standard deviation \sigma, entails finding a 100(1 - \alpha)\% symmetric confidence interval using standard normal distributions (solving for Z).
Using Standard Normal Variable
- Introduce the standardized normal variable:
  Z = \frac{X - \mu}{\sigma}.
- The CDF of Z implies probabilities can be evaluated through the solutions of specified equations leveraging symmetry due to the properties of normal distributions.
Calculation Example
- Example value: if \alpha = 0.05, then z_{\alpha/2} = 1.96 so the 95% confidence interval is given by \mu \pm 1.96\sigma.
- Note that this example does not involve actual data sample values, as distribution parameters \mu and \sigma are assumed known for theoretical background.
Practical Impact of Unknown Parameters
- In practice, \mu and \sigma are often unknown, complicating the calculation of confidence intervals, which necessitate approximation based on sample data.

7.2. Large-Sample Confidence Intervals

Foundational Concepts
- For large samples, where N o \infty, variables that are means \bar{X} converge towards normal distributions due to the Central Limit Theorem (CLT).
Mean Confidence Interval Calculation
- Use properties of large samples to express the confidence interval for the mean \mu, as
  P\left(\bar{X} - z{\alpha/2}\frac{s}{\sqrt{N}} \leq \mu \leq \bar{X} + z{\alpha/2}\frac{s}{\sqrt{N}} \right) = 1 - \alpha,
- where s represents the sample standard deviation.
Desired Confidence Level and Sample Size Dynamics
- A significant aspect is balancing between confidence level and required sample size; increasing confidence levels can rapidly inflate bounds across intervals.
- Sample size determines width w of the interval
  w = 2z_{\alpha/2}\frac{\sigma}{\sqrt{N}},
- Thus, N must also be adjusted accordingly with respect to desired bounds on confidence level.
Interpretation of Confidence Intervals
- A established confidence interval P(X \in [L, U]) = 1 - \alpha broadly signifies that with numerous repeats of the sampling method, average probability of capturing the true parameter \mu across intervals matches the defined confidence rate.

7.3. Intervals Based on a Normal Population Distribution

Problem Statement
- This section considers finding confidence interval limits under situations where N is small, and the Central Limit Theorem is not functional.
- Formula:
  P \left( T \leq \mu \leq T \right),
- with critical values derived from Student's t-distribution if the underlying data conforms to normality.
Understanding Student's T-distribution
- T established as T = \frac{X - \mu}{S/\sqrt{N}} conforms to a t-distribution with N - 1 degrees of freedom, characterized by additional spread as opposed to standard deviations.
Finding Exact Confidence Intervals
- Establish critical t-values through F(t_{\alpha/2, N-1}) = 1 - \alpha/2 leading to exact confidence intervals calculated critically aligned with sample estimates.
Sample Size Considerations
- Maintain flexibility across sample value sizes; the approach remains exact for any sample size under the assumption of normality.
Computational Steps for Estimation
- 1. Compute critical t-value via CDF
- 2. Establish mean \bar{X}
- 3. Calculate sample standard deviation s
- 4. Establish final bounds for the confidence interval through critical formulations.

This section encapsulates theoretical background, pragmatics of interval estimation highlighting significant factors influencing confidence bounds across different data scenarios.