Confidence Intervals

Parameter - A numeric description of the population

Statistic - A numeric description of the sample

Sampling distribution = probability distribution of a statistic based on every possible sample of a particular size (how much a sample can differ from another)

Central Limit Theorem: If our sample size is large, the sapling distribution will be approximately normal for large samples.

Confidence Interval

• Definition: A range of values that is very likely to contain the true (unknown) population parameter.

• Format: Estimate ± Something

• Typically, the estimate is the statistics that corresponds to the parameter.

• Definition: The percentage of samples that will produce an interval that will contain the true

parameter.

• Typically we use 90%, 95%, or 99%.

Making a confidence interval for the population mean. Making the confidence value for a range of values that will likely cover what the mean is.

qnorm(.025) = -1.96

2.5th percentile of xBar = mean - 1.96(SD/sqrt(size))

97.5th percentile of xBar = mean + 1.96(SD/sqrt(size))

p(mean - 1.96(SD/sqrt(size)) < xBar < mean + 1.96(SD/sqrt(size))) = .95

P( ̅𝑥 - 1.96( 𝜎/sqrt(𝑛) < μ < ̅ 𝑥 + 1.96( 𝜎/sqrt(𝑛))) = .95

95% confidence interval for mean: xBar +- 1.96(SD/sqrt(n))

100%(1-α) CI for μ: xBar ± (multiplier)( 𝒔/sqrt(𝒏))

multiplier = -qnorm(alpha/2)=qnorm(1-alpha/2)

Bootstrap Principle:
- Re-sampling form the original random sample roughly equal to sample from the population with high probability

Useful for estimating many parameters if the original random sample isn’t large enough

Bootstrap

Draw at random
with replacement
as many values as the original contained

Bootstrap confidence interval

construct a 95% confidence interval by taking the 2.5% and (100-2.5)% percentiles
Basically just the middle 95%