Sampling error is the difference between an estimate and the true population statistic.
Each hypothetical sample has a different amount of error.
A sampling distribution is the distribution of estimates from many hypothetical samples.
With large N, sample means follow a normal curve around the true mean.
Standard error is the average distance from a sample mean to the true mean.
It reflects expectations across hypothetical samples.
Standard error is the standard deviation of hypothetical sample means.
Apply normal curve rules of thumb.
95% of means from large samples are within ± 1.96 standard errors of the true mean.
Confidence intervals use standard error to estimate plausible values for the true population statistic.
They define a range of μ values that could have reasonably produced the sample mean.
A confidence interval gives the two most extreme values of the population mean that could have reasonably produced the data.
X could be from a sampling distribution with a population mean as high as μ_H
X could be from a sampling distribution with a population mean as low as μ_L
A 95% critical value (CV_{95%}) is the number of standard error units from the true mean that captures 95% of sample means.
For large samples, CV_{95%} = ± 1.96
We may increase CV_{95%} for smaller samples (t distribution).
Margin of error is ± 1.96 × SE in large samples.
Adjust CV_{95%} for smaller samples (t distribution).
Trial compared varenicline and naltrexone against varenicline alone for smoking cessation and drinking reduction among heavy-drinking smokers.
Breath carbon monoxide: Biomarker of smoking behavior.
Medication arm: Participants received varenicline plus naltrexone or varenicline plus placebo.
SE = 0.46 reflects expectation across hypothetical samples.
Margin of error = ± 1.96 × SE = ± 0.90
95% of hypothetical samples have a mean within ± 0.90 of μ
The CV_{95%} from the t distribution depends on N.
The 95% confidence interval [4.6, 6.4] defines a range of μ values that could have reasonably produced the sample mean.
Confidence interval gives extreme values of the population mean that could have reasonably produced these data.
X could be from a sampling distribution with μ_H = 6.4
X could be from a sampling distribution with μ_L = 4.6
The 95% confidence interval was [4.6, 6.4].
The true mean could be as low as 4.6 and as high as 6.4.
A population with μ = 5 could have reasonably produced the sample mean.
Confidence and probability unfold over many hypothetical random samples.
They are properties of data, not the population statistic.
Estimates vary across hypothetical random samples.
95 out of 100 sample means should fall within the margin of error of the true mean.
The true mean doesn’t change.
Estimates and Margin of Error
95% of confidence intervals will contain the true mean across many hypothetical samples.
95% probability refers to a long-run process over many random samples.
There is a 95% chance that the confidence interval contains the true mean.
1908: William Sealy Gosset - t distribution
Gossett derived the t-distribution for small samples.
The t-distribution stretches out as N decreases.
Software uses the t-distribution for confidence intervals.
T-distribution predicts critical values, adjusting for sample size.
Degrees of freedom adjustment: shape of t-distribution and CV_{95%} depend on N – 1
Note: CI of the mean assumes sample means follow a t-distribution with N - 1 degrees of freedom.
One Sample t-test
Practice interpreting standard errors and confidence intervals.