Basic Biostatistics - Chapter 11: Inference About a Mean

Inference About a Mean

  • Chapter 11 focuses on making inferences about a population mean.
  • Date: 5/3/2025

Chapter 11 Topics

  • 11.1 Estimated Standard Error of the Mean
  • 11.2 Student’s t Distribution
  • 11.3 One-Sample t Test
  • 11.4 Confidence Interval for µ
  • 11.5 Paired Samples
  • 11.6 Conditions for Inference

Standard Error of the Mean

  • When the population standard deviation (\sigma) is unknown, we estimate it using the sample standard deviation (s) to calculate the standard error.
  • This contrasts with previous chapters where (\sigma) was known, allowing the use of z-procedures.

Student's t Procedures

  • Using 's' instead of (\sigma) introduces additional uncertainty.
  • As a result, z procedures are not appropriate, and we use Student’s t procedures instead.
  • The t-distribution is more suitable when the normal distribution doesn’t fit well, especially with smaller sample sizes.
  • William Sealy Gosset (1876–1937) developed the t-distribution.

T-score vs. z-score

  • When to use a t-score:
    • The sample size is below 30.
    • The population standard deviation is unknown (estimated from your sample data).

Student’s t Distributions

  • Probability distributions are identified by degrees of freedom (df).
  • The t-distribution is similar to the standard normal distribution (Z), but with broader tails.
  • As df increases, the tails become skinnier, and the t-distribution approaches the z-distribution.
  • A t-distribution with infinite degrees of freedom is equivalent to a Standard Normal Z distribution.

T-Test

  • 't' is a measure of how likely a difference in means is statistically significant.
  • As with all test statistics, we compare 't' to its critical value.
  • The value of 't' is calculated from sample data.
  • The value of 't-critical' is determined by the value selected for Alpha, the Significance Level, and the appropriate t-Distribution.
  • A large value for t makes it more likely to be larger than t-critical, increasing the likelihood of a statistically significant difference in the means.

T-Test Equation

  • The t-test formula is given by:
    t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{N}}}

    • Where:
      • \bar{X} is the sample mean.
      • \mu is the population mean.
      • s is the sample standard deviation.
      • N is the sample size.

Types of T-Tests

  • Three different types of t-tests:
    • 1-Sample t-test
      • Compares a sample mean to a population or specified mean (a target, an estimate, or a historical value).
    • 2-Sample t-test
      • Compares the means of two independent samples from different populations or processes.
    • Paired t-test
      • Analyzes paired data (e.g., before and after training scores) from the same test subjects.

Degrees of Freedom

  • For a single-sample t-test, df = n – 1, where n is the sample size.
  • In the 2-sample t-test, df = n1 + n2 − 2

T Table

  • Table C (t table):
    • Rows represent degrees of freedom (df).
    • Columns represent probabilities.
    • Entries are t values.
    • Notation: t_{cum_prob,df} = t \ value
    • Example: t_{.975, 9} = 2.262

One-Sample t Test

  • Objective: to test a claim about a population mean µ.
  • Conditions:
    • Simple Random Sample.
    • Normal population or “large sample”.

Hypothesis Statements

  • Null hypothesis: H0: \mu = \mu0
    • \mu_0 represents the population mean expected under the null hypothesis.
  • Alternative hypotheses:
    • Ha: \mu < \mu0 (one-sided, left).
    • Ha: \mu > \mu0 (one-sided, right).
    • Ha: \mu ≠ \mu0 (two-sided).

Example 1

  • Research Question: Do SIDS babies have lower average birth weights than a general population mean (\mu) of 3300 gms?
  • Hypotheses:
    • H_0: \mu = 3300
    • Ha: \mu < 3300 (one-sided) or Ha: \mu ≠ 3300 (two-sided).

One-Sample t Test Statistic

  • Test statistic:

t{stat} = \frac{\bar{x} - \mu0}{SE_{\bar{x}}}

  • Where:

    • \bar{x} = the sample mean
    • \mu_0 = expected population mean under H0
    • SE_{\bar{x}} = \frac{s}{\sqrt{n}}
    • This t statistic has n - 1 degrees of freedom

Example Data

  • SRS n = 10 birth weights (grams) of SIDS cases:

    2998, 3740, 2031, 2804, 2454, 2780, 2203, 3803, 3948, 2144

Example Calculation

  • Testing H_0: \mu = 3300

t{stat} = \frac{\bar{X} - \mu}{SE{\bar{X}}} = \frac{2890.5 - 3300}{227.7} = -1.80

  • This statistic has df = n-1=10-1=9

P-value via Table C

  • Bracket |t_{stat}| between t critical values.
  • For |t_{stat}| = 1.80 with 9 df.
  • One-tailed: 0.05 < P < 0.10
  • Two-tailed: 0.10 < P < 0.20

Interpretation

  • Testing H_0: \mu = 3300 gms
  • Two-tailed P > .10
  • Conclude: weak evidence against H_0
  • The sample mean (2890.5) is NOT significantly different from 3300.

Confidence Level (Interval)

  • It is a measure of the reliability of a result.
  • A confidence level of 95% or 0.95 means that there is a probability of at least 95% that the result is reliable

Confidence Interval

  • 95% CI for µ = \bar{x} \pm t{9,.975} \cdot SE{\bar{x}} = 2890.5 \pm (2.262)(227.68) = 2890.5 \pm 515.1 = (2375 \ to \ 3406) grams
  • Interpretation: Population mean µ is between 2375 and 3406 grams with 95% confidence

The Normality Condition

  • t Procedures require Normal population or large samples
  • How do we assess this condition?
  • Guidelines. Use t procedures when:
    • Population Normal
    • population symmetrical and n ≥ 10
    • population skewed and n ≥ ~45 (depends on severity of skew)

Sample Size and Power

Methods:

  • (1) n required to achieve m when estimating µ
  • (2) n required to test H0 with 1−β power
  • (3) Power of a given test of H0

Power

  • \alpha ≡ alpha (two-sided)
  • \Delta ≡ “difference \ worth \ detecting” = \mua – \mu0
  • n ≡ sample \ size
  • \sigma ≡ standard \ deviation
  • \Phi(z) ≡ cumulative \ probability \ of \ Standard \ Normal \ z \ score

Power: SIDS Example

  • Let \alpha = .05 and z_{1 - .05/2} = 1.96
  • Test: H0: \mu = 3300 vs. Ha: \mu = 3000. Thus: \Delta ≡ \mu1 – \mu0 = 3300 – 3000 = 300
  • n = 10 and \sigma = 720 (see prior SIDS example)
  • Use Table B to look up cum prob Þ \Phi(-0.64) = .2611

Example 2

  • Using an adequate commercialized kit and 5g of initial mass of fresh meat, we extract an average of 5ug of DNA per sample. To increase the extraction efficiency, a scientist adds a grinding step before starting the extraction of the DNA from 10 fresh meat samples.
  • Considering that the variable is normally distributed and the sample is randomly selected, Does grinding improve DNA yield?
  • Sample data set (DNA quantities in ug):
    10 8 8 7 6 4 5 9 12 4

Hypotheses and Statistics

  • Hypotheses:
    • H0: \mu = \mu0
    • Ha: \mu > \mu0
  • Statistics:
    • \bar{X} = 7.33
    • s = 2.626
    • n = 10
    • df = 9
    • SE_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{2.626}{\sqrt{10}} = 0.83
    • t{stat} = \frac{\bar{X} - \mu0}{SE_{\bar{x}}} = \frac{7.3 - 5}{0.83} = 2.7

Decision and Conclusion

  • Decision: Calculated t (2.7) is greater than the critical t (1.83) at α=0.05. H_0 is rejected
  • Conclusion and interpretation: The grinding step significantly improves extracted DNA quantities

Paired Samples

  • Two samples
  • Each data point in one sample uniquely matched to a data point in the other sample
  • Examples of paired samples
    • “Pre-test/post-test”
    • Cross-over trials
    • Pair-matching

Example: Oat Bran and Cholesterol

  • Does oat bran reduce LDL cholesterol?
  • Start half of subjects on CORNFLK diet.
  • Start other half on OATBRAN.
  • Two weeks Þ LDL cholesterol
  • Washout period
  • Cross-over to other diet.
  • Two weeks Þ LDL cholesterol

Oat bran data

  • LDL cholesterol mmol

Within-pair difference “DELTA”

  • Let DELTA = CORNFLK - OATBRAN
  • All procedures are now directed toward difference variable DELTA

Exploratory and descriptive stats

  • Stemplot
  • subscript d denotes “difference”

Confidence Interval

  • 95% confident population mean difference \mu_d is between 0.105 and 0.656 mmol/L

Hypothesis Test

  • Claim: oat bran diet is associated with a decline (one-sided) or change (two-sided) in LDL cholesterol.
  • Test H0: \mud = \mu0 where \mu0 = 0
    • Ha: \mud > \mu_0 (one-sided)
    • Ha: \mu ≠ \mu0 (two-sided)

Paired t statistic

  • Current data: n = 12
  • \bar{X_d} = 0.3808
  • Test H_0: \mu = 0
  • s_d = 0.4335

t{stat} = \frac{\bar{Xd} - 0}{\frac{s_d}{\sqrt{n}}} = \frac{0.38083 - 0}{\frac{0.4335}{\sqrt{12}}} = 3.043

  • df = n-1=12-1=11

P-value via Table C

  • One-tailed: .005 < P < .01
  • Two-tailed: .01 < P < .02

Interpretation

  • Testing H_0: \mu = 0
  • Two-tailed P = 0.011
  • Good reason to doubt H0
  • (Optional) The difference is “significant” at \alpha = .05 but not at \alpha = .01