Sampling theory is key for estimating population parameters from sample data, resulting in uncertainty in estimates.
Confidence intervals (CIs) quantify this uncertainty by stating a range within which a parameter is expected to lie with a certain level of confidence (e.g., 95%).
Constructing Confidence Intervals
For a population mean \mu with standard deviation \sigma:
Sample mean is \bar{X} from N participants.
According to the Central Limit Theorem, the sampling distribution of the mean is approximately normal.
95% confidence interval can be constructed as:
(\bar{X} - 1.96 \times SEM, \bar{X} + 1.96 \times SEM)
Standard Error of the Mean (SEM) is defined as: SEM = \frac{\sigma}{\sqrt{N}}
Adjusting for Sample Standard Deviation
Often, the true population standard deviation \sigma is not known, requiring the use of the sample standard deviation s instead.
This necessitates the use of the t-distribution, especially with small sample sizes, which leads to a larger multiplier (e.g., t_{N-1}).
Sample Size Impact
As sample size N increases, confidence intervals tend to be narrower; with small N, confidence intervals are wider due to increased uncertainty about estimates.
Example for N = 10: Multiplier = 2.26 (wider CI) vs. N = 300: Multiplier ≈ 1.96.
Interpreting Confidence Intervals
A CI communicates that there is a specified probability (e.g., 95%) that the true population mean falls within the interval derived from sample data.
For instance, a sample mean \bar{X} = 100.14 with a confidence interval (98.85, 100.43) means:
We are 95% confident the true mean lies within this range.
Reflects the consistency across different potential samples drawn from the population.
Long-Run Interpretation
Confidence intervals are defined within the context of repeated sampling:
Over many samples, 95% of CIs constructed should contain the true population mean.
Each individual CI either contains or does not contain the true mean.
This principle is visually illustrated with sample means and CIs in empirical data, reinforcing the concept of repeatability and uncertainty in estimates.