Confidence Intervals

Parameter - A numeric description of the population

Statistic - A numeric description of the sample

Sampling distribution = probability distribution of a statistic based on every possible sample of a particular size (how much a sample can differ from another)

Central Limit Theorem: If our sample size is large, the sapling distribution will be approximately normal for large samples.

Confidence Interval

• Definition: A range of values that is very likely to contain the true (unknown) population parameter.

• Format: Estimate ± Something

• Typically, the estimate is the statistics that corresponds to the parameter.

• Definition: The percentage of samples that will produce an interval that will contain the true

parameter.

• Typically we use 90%, 95%, or 99%.

Making a confidence interval for the population mean. Making the confidence value for a range of values that will likely cover what the mean is.

qnorm(.025) = -1.96

2.5th percentile of xBar = mean - 1.96(SD/sqrt(size))

97.5th percentile of xBar = mean + 1.96(SD/sqrt(size))

p(mean - 1.96(SD/sqrt(size)) < xBar < mean + 1.96(SD/sqrt(size))) = .95

P( ̅𝑥 - 1.96( 𝜎/sqrt(𝑛) < μ < ̅ 𝑥 + 1.96( 𝜎/sqrt(𝑛))) = .95

95% confidence interval for mean: xBar +- 1.96(SD/sqrt(n))

100%(1-α) CI for μ: xBar ± (multiplier)( 𝒔/sqrt(𝒏))

multiplier = -qnorm(alpha/2)=qnorm(1-alpha/2)

Bootstrap Principle:
- Re-sampling form the original random sample roughly equal to sample from the population with high probability

Useful for estimating many parameters if the original random sample isn’t large enough

Bootstrap

  • Draw at random

  • with replacement

  • as many values as the original contained

Bootstrap confidence interval

  • construct a 95% confidence interval by taking the 2.5% and (100-2.5)% percentiles

  • Basically just the middle 95%