Confidence Intervals
Parameter - A numeric description of the population
Statistic - A numeric description of the sample
Sampling distribution = probability distribution of a statistic based on every possible sample of a particular size (how much a sample can differ from another)
Central Limit Theorem: If our sample size is large, the sapling distribution will be approximately normal for large samples.
Confidence Interval
• Definition: A range of values that is very likely to contain the true (unknown) population parameter.
• Format: Estimate ± Something
• Typically, the estimate is the statistics that corresponds to the parameter.
• Definition: The percentage of samples that will produce an interval that will contain the true
parameter.
• Typically we use 90%, 95%, or 99%.
Making a confidence interval for the population mean. Making the confidence value for a range of values that will likely cover what the mean is.
qnorm(.025) = -1.96
2.5th percentile of xBar = mean - 1.96(SD/sqrt(size))
97.5th percentile of xBar = mean + 1.96(SD/sqrt(size))
p(mean - 1.96(SD/sqrt(size)) < xBar < mean + 1.96(SD/sqrt(size))) = .95
P( ̅𝑥 - 1.96( 𝜎/sqrt(𝑛) < μ < ̅ 𝑥 + 1.96( 𝜎/sqrt(𝑛))) = .95
95% confidence interval for mean: xBar +- 1.96(SD/sqrt(n))
100%(1-α) CI for μ: xBar ± (multiplier)( 𝒔/sqrt(𝒏))
multiplier = -qnorm(alpha/2)=qnorm(1-alpha/2)
Bootstrap Principle:
- Re-sampling form the original random sample roughly equal to sample from the population with high probability
Useful for estimating many parameters if the original random sample isn’t large enough
Bootstrap
Draw at random
with replacement
as many values as the original contained
Bootstrap confidence interval
construct a 95% confidence interval by taking the 2.5% and (100-2.5)% percentiles
Basically just the middle 95%