Notes on Confidence Intervals, One- and Two-Sample T-Tests, and Practical Applications

Central ideas: z-values, means, and central tendency

z-value interpretation: how many standard deviations away a value is from the true mean of a distribution when the population standard deviation is known. In contrast, with unknowns we use sample-based measures like the x̄ (sample mean) to assess deviation from a central value μ.
Two main perspectives on central tendency in practice:
- Known μ: compare x̄ to a fixed population mean using a z or t framework depending on whether σ is known.
- Estimated μ: when μ is unknown and we estimate it with x̄, accounting for sampling variability and measurement error.
The same basic idea of distance from the central tendency underlies both z-values and t-values: how far an observed value or a sample statistic is from the central value, adjusted for the amount of data (sample size) and variability.
When we have limited measurements (e.g., 3, 5, or 6 measurements) instead of many (e.g., 10,000), we correct for the extra uncertainty in the estimate of the true mean.
In comparing two distributions, we often quantify differences relative to the observed variability (standard deviation) using a two-sample t-test.

One-tailed vs two-tailed tests: intuition and regulatory framing

One-tailed test: testing for an effect in a specific direction (e.g., mean > μ0 or mean < μ0).
Two-tailed test: testing for an effect in either direction (|mean − μ0| is large).
Practical example (regulatory framing):
- Contaminant threshold on a product or sample (e.g., drinking water, irrigation water) often depends on whether the measured level exceeds a regulatory limit. A two-tailed framing corresponds to detecting deviations in either direction, while a one-tailed framing can reflect a one-sided regulatory concern (e.g., only “too high” is problematic).
- Example in transcript: a contaminant threshold around 150 units on a box matters; values above the limit are disallowed, while slightly below may be acceptable depending on the rule. FDA-like contexts use a boundary to decide if a product is compliant or not; the choice of one- or two-tailed thinking reflects the regulatory decision rule, the desired risk level, and the consequences of false positives vs false negatives.

Variability, measurement error, and compatibility across samples

When two measurements come from the same distribution (e.g., air samples or lead measurements from the same process), values like 418.9 vs 415.5 can differ due to sampling and instrument variation.
Key question: are two observed values compatible with each other, given the expected sampling variability?
If a measured value is about three standard deviations away from the true value, that corresponds to a very small probability under a Gaussian distribution (roughly 0.3% in a two-tailed sense, about 1 in 100 for the two-tailed interpretation).
This motivates using a t-test framework to determine whether observed deviations are likely due to random variability or indicate a real shift from the true value.

Degrees of freedom and critical values in a one-sample context

Degrees of freedom for a one-sample t-test with n measurements: df = n − 1.
Example: with n = 5 measurements, df = 4.
t critical values depend on the desired confidence level and df:
- For 95% confidence with df = 4, t_crit ≈ 2.776.
- For 99% confidence with df = 4, t_crit ≈ 4.6.
How to use: compare the absolute value of the t statistic to t_crit to decide whether to reject the null hypothesis H0: μ = μ0.
Intuition about CI width: higher confidence levels (e.g., 99%) yield wider confidence intervals, reflecting greater certainty about fewer false positives but at the cost of precision.

Confidence intervals: interpretation, reporting formats, and practical use

Core interpretation (frequentist view):
- If you created 100 independent confidence intervals for the same parameter, about 95 of them would contain the true value (for a 95% CI).
- The remaining ~5% would not contain the true value due to random sampling variability.
Practical interpretation: you must decide what level of confidence (risk of missing the true value) you can live with in decision making.
Reporting styles for a confidence interval:
- Mean ± margin: e.g., 10 ± 2 (with units), or equivalently the interval [8, 12].
- Explicit interval notation: [lower bound, upper bound] with units attached.
Connection to measurement interpretation: CI provides a range that is compatible with a hypothesized true value; it is not a proof but a statement about precision given the data and the chosen confidence level.
Example: estimating a molecular weight (gerpatinib):
- With df = 3 (n ≈ 4 observations), at 95% confidence, t_crit ≈ 3.18.
- A calculated CI might be approximately [1376.2, 1582.6], so the true molecular weight is consistent with the observed peptide’s target range, though this does not prove exact identity or patentability.
- Higher confidence (e.g., 99%) would widen the interval (larger t_crit, e.g., ≈ 4.6 for df = 3).
Practical caveat in reporting CI for confirmation and patent work:
- A result can be “consistent with” a hypothesis or compound within a CI, but that alone is not proof; multiple alternative isomers or formulas can yield similar molecular weights.
- Regulatory or publication standards may demand tighter estimates than a CI alone can provide.

The two-sample t-test: comparing two groups (e.g., drug vs placebo)

Purpose: determine if the mean of sample A differs from the mean of sample B.
Key idea: when SDs are not assumed equal, compute the standard error of the difference in means using the two-group variances:
- Standard error of the difference: $SE = \,\sqrt{ \frac{s1^2}{n1} + \frac{s2^2}{n2} }$
- Test statistic (Welch-type t): $t = \frac{\bar{x}1 - \bar{x}2}{SE}$
Degrees of freedom for the two-sample case are not simply n1+n2-2; a Satterthwaite/Welch approximation is used in general:
- $\nu \approx \frac{\left( \dfrac{s1^2}{n1} + \dfrac{s2^2}{n2} \right)^2}{ \dfrac{(s1^2/n1)^2}{n1-1} + \dfrac{(s2^2/n2)^2}{n2-1} }$
Example framework from transcript:
- Drug vs placebo in hemoglobin: meandrug = 12.2, SDdrug = 3.2; mean_placebo = 10.2, with placebo SD larger (not specified).
- Sample sizes: n1 (drug) and n2 (placebo) such that df ≈ n1 + n2 − 2, e.g., around 23 if total N ≈ 25–26.
- Compute t = (12.2 − 10.2) / sqrt( (3.2^2 / n1) + (s2^2 / n2) ).
- With the given numbers, the result might yield a t-value that does not exceed the 95% critical value for df ≈ 23, so you would fail to reject H0 at 95% confidence.
Decision interpretation:
- If |t| > t_crit(df, two-tailed) then reject H0: there is a significant difference between the two group means.
- If not, conclude there is no statistically significant difference at the chosen confidence level.
Practical notes:
- As confidence level rises (e.g., 99%), the critical value increases and the test becomes harder to reject (wider CI, larger required difference).
- In medical research, many small-sample studies yield suggestive but not definitive results, highlighting the need for larger trials for a conclusive answer.

Connecting theory to practice: decisions, confidence, and real-world implications

Confidence level choice is a management decision, balancing precision against risk tolerance and regulatory expectations.
Reporting choices matter: a researcher may present either a mean ± CI or a mean with explicit interval bounds; both convey the same information but differ in emphasis.
Preliminary studies and regulatory pathways:
- Small-sample studies may show suggestive effects but are not sufficient for market approval or patent claims.
- Larger trials reduce uncertainty and increase the reliability of conclusions about efficacy, safety, and quality.
Ethical and practical considerations:
- Overstating certainty from small samples can mislead decision-makers and patients.
- Transparent reporting of CI widths, degrees of freedom, and underlying assumptions is essential for reproducibility and regulatory scrutiny.

Summary tips for exam-ready understanding

Remember: z-values assume known σ; t-values replace σ with s when σ is unknown, and introduce df = n − 1 for a one-sample case.
For two-sample comparisons, use the SE of the difference with the appropriate pooled or separate variances, and compute df via the Welch approximation when variances are unequal.
Confidence intervals quantify precision, not proof: a CI can be consistent with a true value or compound, but depends on the chosen confidence level.
The practical choice of one- vs two-tailed tests, as well as the confidence level (95%, 99%, 80%), should reflect the regulatory context, risk tolerance, and study design.
Always report units and be mindful of how CI width and p-values interact with sample size and variability in your conclusions.