QBIO 305 Statistics for the Life Sciences

QBIO 305 Statistics for the Life Sciences Notes

Professor Zhengye Zhou

Central Limit Theorem
  • Let Y_i be independent and identically distributed random variables, each with mean \mu and standard deviation \sigma.

  • For large n:

    • Mean: \bar{Y}_n = \frac{Y_1 + \cdots + Y_n}{n}

    • Approximation: \bar{Y}_n \approx \mu + \frac{\sigma}{\sqrt{n}} N(0, 1)

    • Where N(0, 1) is a standard normal random variable.

  • The Central Limit Theorem provides information about the average of these random variables.

Application of Central Limit Theorem
  • Example: Weight of lambs is viewed as a random variable where:

    • Population mean \mu and population standard deviation \sigma are unknown.

    • Mean weight can be calculated from the sample, allowing estimation of the population parameters.

Statistical Estimation
  • Case Study: Researchers captured 14 male Monarch butterflies and measured wing area:

    • Sample Mean: \bar{y} = 32.8143 \approx 32.81 \, cm^2

    • Sample Standard Deviation: s = 2.4757 \approx 2.48 \, cm^2

  • Population mean and standard deviation represented as:

    • \mu = population mean wing area,

    • \sigma = population SD of wing area.

  • Sample mean \bar{y} serves as an estimate for population mean \mu and sample SD s estimates \sigma.

Standard Error of the Mean
  • Standard deviation of the sampling distribution of \bar{Y} is:

    • \sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}

  • Using sample SD, standard error (SE) is:

    • \text{SE}_{\bar{Y}} = \frac{s}{\sqrt{n}}

  • Example Calculation:

    • For n = 28 lambs:

      • s = 0.65 \, kg

      • \bar{y} = 5.17 \, kg

    • SE Calculation:

      • \text{SE} = \frac{0.65}{\sqrt{28}} \approx 0.12 \, kg

Standard Error vs Standard Deviation
  • Standard Deviation (SD): Measures variability among individual observations in the sample.

  • Standard Error (SE): Measures variability associated with the sample mean as an estimate of the population mean.

Visual Representation of Data
  • Figures represent data displayed as \bar{y} \pm \text{SE}:

    • Includes bar graphs and interval plots.

  • Importance of clear labeling as SE and SD figures differ significantly.

Confidence Interval for Population Mean \mu
  • Given data set:

    • Unknown values of population mean \mu and standard deviation \sigma.

    • Sample mean \bar{y} and sample SD s calculated.

  • Approximation and distribution:

    • From the Central Limit Theorem, \frac{\bar{Y} - \mu}{\sigma/\sqrt{n}} has a N(0, 1) distribution.

    • Use of z-scores for 95% confidence interval:

      • P(-1.96 < \frac{\bar{Y} - \mu}{\sigma/\sqrt{n}} < 1.96) \approx 0.95

  • Calculate 95% CI:

    • Use sample SD s in place of \sigma if normal distribution holds.

Student's t Distribution
  • If data approximates normal distribution:

    • Use Student's t distribution to construct confidence intervals.

  • With increasing degrees of freedom (df), the t-distribution approaches the normal distribution.

Critical Values of t Distribution
  • Use t-tables for determining critical values based on degrees of freedom.

  • Rounding down degrees of freedom may be necessary when values are not listed.

Confidence Interval Calculation for Different Levels
  • Two-sided 95% CI: Use 0.025 column.

  • One-sided confidence intervals: 90% lower (0.10) and upper (0.05) bounds calculated accordingly.

Butterfly Wing Area Example Calculation
  • Given:

    • n = 14, \bar{y} = 32.8143, s = 2.4757

  • 95% Confidence Interval Calculation:

    • \bar{y} \pm \left(t_{0.025} \times \frac{s}{\sqrt{n}}\right)

      • Result: 32.81 \pm \left(2.16 \times \frac{2.48}{\sqrt{14}}\right)

    • Final interval: Approx (31.4, 34.2)

  • 90% Confidence Interval Calculation:

    • \bar{y} \pm \left(t_{0.05} \times \frac{s}{\sqrt{n}}\right)

    • Result: Approx (31.6, 34)

Normality Check and Interpretation
  • Importance of checking normal distribution for valid confidence interval estimation.

  • Confidence interval does not imply population mean is probabilistically bounded but rather is an estimate.

  • If many independent samples taken and confidence intervals calculated, expect that about 95% will capture the true mean.

  • As sample size increases, interval width decreases (diminishing return phenomenon).

One-Sided Confidence Intervals
  • Lower bound calculated as: \bar{y} - t_{0.10} \text{SE}_{\bar{Y}}.

  • Upper bound calculated as: \bar{y} + t_{0.05} \text{SE}_{\bar{Y}}.

Sample Size Estimation for Precision
  • To estimate necessary sample size for desired SE:

    • E.g., SE = rac{s}{\sqrt{n}} must meet a specific precision threshold.

Diminishing Returns in Data Collection
  • Each additional observation improves accuracy less than previous ones.

Conditions for Validity of Estimation Methods
  • Ensure the sample is a simple random sample to uphold the integrity of statistical inferences.

  • For small samples, data must be approximately normal. Larger samples (typically n=30) invoke Central Limit Theorem.

  • High awareness of possible outliers is required.