Basic Biostatistics - Chapter 11: Inference About a Mean

Inference About a Mean

Chapter 11 focuses on making inferences about a population mean.
Date: 5/3/2025

Chapter 11 Topics

11.1 Estimated Standard Error of the Mean
11.2 Student’s t Distribution
11.3 One-Sample t Test
11.4 Confidence Interval for µ
11.5 Paired Samples
11.6 Conditions for Inference

Standard Error of the Mean

When the population standard deviation (\sigma) is unknown, we estimate it using the sample standard deviation (s) to calculate the standard error.
This contrasts with previous chapters where (\sigma) was known, allowing the use of z-procedures.

Student's t Procedures

Using 's' instead of (\sigma) introduces additional uncertainty.
As a result, z procedures are not appropriate, and we use Student’s t procedures instead.
The t-distribution is more suitable when the normal distribution doesn’t fit well, especially with smaller sample sizes.
William Sealy Gosset (1876–1937) developed the t-distribution.

T-score vs. z-score

When to use a t-score:
- The sample size is below 30.
- The population standard deviation is unknown (estimated from your sample data).

Student’s t Distributions

Probability distributions are identified by degrees of freedom (df).
The t-distribution is similar to the standard normal distribution (Z), but with broader tails.
As df increases, the tails become skinnier, and the t-distribution approaches the z-distribution.
A t-distribution with infinite degrees of freedom is equivalent to a Standard Normal Z distribution.

T-Test

't' is a measure of how likely a difference in means is statistically significant.
As with all test statistics, we compare 't' to its critical value.
The value of 't' is calculated from sample data.
The value of 't-critical' is determined by the value selected for Alpha, the Significance Level, and the appropriate t-Distribution.
A large value for t makes it more likely to be larger than t-critical, increasing the likelihood of a statistically significant difference in the means.

T-Test Equation

The t-test formula is given by:
t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{N}}}
- Where:
  - \bar{X} is the sample mean.
  - \mu is the population mean.
  - s is the sample standard deviation.
  - N is the sample size.

Types of T-Tests

Three different types of t-tests:
- 1-Sample t-test
  - Compares a sample mean to a population or specified mean (a target, an estimate, or a historical value).
- 2-Sample t-test
  - Compares the means of two independent samples from different populations or processes.
- Paired t-test
  - Analyzes paired data (e.g., before and after training scores) from the same test subjects.

Degrees of Freedom

For a single-sample t-test, df = n – 1, where n is the sample size.
In the 2-sample t-test, df = n1 + n2 − 2

T Table

Table C (t table):
- Rows represent degrees of freedom (df).
- Columns represent probabilities.
- Entries are t values.
- Notation: t_{cum_prob,df} = t \ value
- Example: t_{.975, 9} = 2.262

One-Sample t Test

Objective: to test a claim about a population mean µ.
Conditions:
- Simple Random Sample.
- Normal population or “large sample”.

Hypothesis Statements

Null hypothesis: H0: \mu = \mu0
- \mu_0 represents the population mean expected under the null hypothesis.
Alternative hypotheses:
- Ha: \mu < \mu0 (one-sided, left).
- Ha: \mu > \mu0 (one-sided, right).
- Ha: \mu ≠ \mu0 (two-sided).

Example 1

Research Question: Do SIDS babies have lower average birth weights than a general population mean (\mu) of 3300 gms?
Hypotheses:
- H_0: \mu = 3300
- Ha: \mu < 3300 (one-sided) or Ha: \mu ≠ 3300 (two-sided).

One-Sample t Test Statistic

Test statistic:

t{stat} = \frac{\bar{x} - \mu0}{SE_{\bar{x}}}

Where:
- \bar{x} = the sample mean
- \mu_0 = expected population mean under H0
- SE_{\bar{x}} = \frac{s}{\sqrt{n}}
- This t statistic has n - 1 degrees of freedom

Example Data

SRS n = 10 birth weights (grams) of SIDS cases:
2998, 3740, 2031, 2804, 2454, 2780, 2203, 3803, 3948, 2144

Example Calculation

Testing H_0: \mu = 3300

t{stat} = \frac{\bar{X} - \mu}{SE{\bar{X}}} = \frac{2890.5 - 3300}{227.7} = -1.80

This statistic has df = n-1=10-1=9

P-value via Table C

Bracket |t_{stat}| between t critical values.
For |t_{stat}| = 1.80 with 9 df.
One-tailed: 0.05 < P < 0.10
Two-tailed: 0.10 < P < 0.20

Interpretation

Testing H_0: \mu = 3300 gms
Two-tailed P > .10
Conclude: weak evidence against H_0
The sample mean (2890.5) is NOT significantly different from 3300.

Confidence Level (Interval)

It is a measure of the reliability of a result.
A confidence level of 95% or 0.95 means that there is a probability of at least 95% that the result is reliable

Confidence Interval

95% CI for µ = \bar{x} \pm t{9,.975} \cdot SE{\bar{x}} = 2890.5 \pm (2.262)(227.68) = 2890.5 \pm 515.1 = (2375 \ to \ 3406) grams
Interpretation: Population mean µ is between 2375 and 3406 grams with 95% confidence

The Normality Condition

t Procedures require Normal population or large samples
How do we assess this condition?
Guidelines. Use t procedures when:
- Population Normal
- population symmetrical and n ≥ 10
- population skewed and n ≥ ~45 (depends on severity of skew)

Sample Size and Power

Methods:

(1) n required to achieve m when estimating µ
(2) n required to test H0 with 1−β power
(3) Power of a given test of H0

Power

\alpha ≡ alpha (two-sided)
\Delta ≡ “difference \ worth \ detecting” = \mua – \mu0
n ≡ sample \ size
\sigma ≡ standard \ deviation
\Phi(z) ≡ cumulative \ probability \ of \ Standard \ Normal \ z \ score

Power: SIDS Example

Let \alpha = .05 and z_{1 - .05/2} = 1.96
Test: H0: \mu = 3300 vs. Ha: \mu = 3000. Thus: \Delta ≡ \mu1 – \mu0 = 3300 – 3000 = 300
n = 10 and \sigma = 720 (see prior SIDS example)
Use Table B to look up cum prob Þ \Phi(-0.64) = .2611

Example 2

Using an adequate commercialized kit and 5g of initial mass of fresh meat, we extract an average of 5ug of DNA per sample. To increase the extraction efficiency, a scientist adds a grinding step before starting the extraction of the DNA from 10 fresh meat samples.
Considering that the variable is normally distributed and the sample is randomly selected, Does grinding improve DNA yield?
Sample data set (DNA quantities in ug):
10 8 8 7 6 4 5 9 12 4

Hypotheses and Statistics

Hypotheses:
- H0: \mu = \mu0
- Ha: \mu > \mu0
Statistics:
- \bar{X} = 7.33
- s = 2.626
- n = 10
- df = 9
- SE_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{2.626}{\sqrt{10}} = 0.83
- t{stat} = \frac{\bar{X} - \mu0}{SE_{\bar{x}}} = \frac{7.3 - 5}{0.83} = 2.7

Decision and Conclusion

Decision: Calculated t (2.7) is greater than the critical t (1.83) at α=0.05. H_0 is rejected
Conclusion and interpretation: The grinding step significantly improves extracted DNA quantities

Paired Samples

Two samples
Each data point in one sample uniquely matched to a data point in the other sample
Examples of paired samples
- “Pre-test/post-test”
- Cross-over trials
- Pair-matching

Example: Oat Bran and Cholesterol

Does oat bran reduce LDL cholesterol?
Start half of subjects on CORNFLK diet.
Start other half on OATBRAN.
Two weeks Þ LDL cholesterol
Washout period
Cross-over to other diet.
Two weeks Þ LDL cholesterol

Oat bran data

LDL cholesterol mmol

Within-pair difference “DELTA”

Let DELTA = CORNFLK - OATBRAN
All procedures are now directed toward difference variable DELTA

Exploratory and descriptive stats

Stemplot
subscript d denotes “difference”

Confidence Interval

95% confident population mean difference \mu_d is between 0.105 and 0.656 mmol/L

Hypothesis Test

Claim: oat bran diet is associated with a decline (one-sided) or change (two-sided) in LDL cholesterol.
Test H0: \mud = \mu0 where \mu0 = 0
- Ha: \mud > \mu_0 (one-sided)
- Ha: \mu ≠ \mu0 (two-sided)

Paired t statistic

Current data: n = 12
\bar{X_d} = 0.3808
Test H_0: \mu = 0
s_d = 0.4335

t{stat} = \frac{\bar{Xd} - 0}{\frac{s_d}{\sqrt{n}}} = \frac{0.38083 - 0}{\frac{0.4335}{\sqrt{12}}} = 3.043

df = n-1=12-1=11

P-value via Table C

One-tailed: .005 < P < .01
Two-tailed: .01 < P < .02

Interpretation

Testing H_0: \mu = 0
Two-tailed P = 0.011
Good reason to doubt H0
(Optional) The difference is “significant” at \alpha = .05 but not at \alpha = .01