Statistical inference has two main components:
Hypothesis Testing: Formulating null and alternative hypotheses based on the research question, leading to decisions to either reject or not reject the null hypothesis.
Estimation: Providing an estimate of the population mean and a likely range, often communicated through confidence intervals.
Confidence intervals specify boundaries within which we believe the population values will fall.
They complement p-values, offering a range of values instead of merely a binary decision (reject/don’t reject).
Derived from the same data, confidence intervals can provide richer information about population means.
A standard formula for a 95% confidence interval (CI) is:
CI = Mean ± (1.96 × Standard Error)
The standard error (SE) is calculated as:
SE = Standard Deviation / √Sample Size
Interval Interpretation: A CI indicates there is a 95% probability that it contains the true population mean.
Suburb 1:
Sample Mean = 230 parts per million (ppm)
Standard Deviation = 35 ppm
Sample Size = 15
Calculate Standard Error:
SE = 35 / √15
Determine lower limit:
Lower limit = 230 - (1.96 × SE)
Determine upper limit:
Upper limit = 230 + (1.96 × SE)
Result:
95% CI = [212, 248] ppm
Interpretation: 95% probability that the mean particulate level lies between 212 and 248 ppm.
Misinterpretation to avoid: "There is a 95% probability that the population mean is within the confidence interval."
Correct interpretations:
"There is a 95% probability that the confidence interval contains the true population mean."
"With 95% confidence, the population mean is between the lower and upper limits of the confidence interval."
Confidence intervals can be adjusted:
99% CI is calculated with:
CI = Mean ± (2.58 × SE)
A 99% confidence interval will be wider than a 95% CI, reflecting greater certainty.
E.g., 99% CI for Suburb 1 could range from [207, 253] ppm.
Constructing confidence intervals for mean differences:
Using example data from Suburb 2 and calculating a 95% CI for the mean difference helps indicate significant differences in pollution levels.
If intervals do not overlap, it suggests differences in means.
Each CI derived provides specific bounds for true population mean differences.
Both p-values and confidence intervals are derived from the same data and should provide consistent conclusions.
A p-value informs likelihood based on hypothesis testing while CIs give probable ranges of parameter estimates.
Limitations of hypothesis testing: It primarily focuses on the null hypothesis and lacks insights on the alternative hypothesis.
Confidence intervals offer a valuable means of estimating population parameters and, when interpreted correctly, furnish essential insights for decision-making.
Important not to confuse the interpretations; ensure awareness of how confidence levels impact intervals as well.
Note that further readings on Bayesian statistics may provide additional perspectives, though not covered in this module.