QBIO 305 Statistics for the Life Sciences

QBIO 305 Statistics for the Life Sciences Notes

Professor Zhengye Zhou

Central Limit Theorem
  • Let YiY_i be independent and identically distributed random variables, each with mean μ\mu and standard deviation σ\sigma.

  • For large nn:

    • Mean: Yˉn=Y1++Ynn\bar{Y}_n = \frac{Y_1 + \cdots + Y_n}{n}

    • Approximation: Yˉnμ+σnN(0,1)\bar{Y}_n \approx \mu + \frac{\sigma}{\sqrt{n}} N(0, 1)

    • Where N(0,1)N(0, 1) is a standard normal random variable.

  • The Central Limit Theorem provides information about the average of these random variables.

Application of Central Limit Theorem
  • Example: Weight of lambs is viewed as a random variable where:

    • Population mean μ\mu and population standard deviation σ\sigma are unknown.

    • Mean weight can be calculated from the sample, allowing estimation of the population parameters.

Statistical Estimation
  • Case Study: Researchers captured 14 male Monarch butterflies and measured wing area:

    • Sample Mean: yˉ=32.814332.81cm2\bar{y} = 32.8143 \approx 32.81 \, cm^2

    • Sample Standard Deviation: s=2.47572.48cm2s = 2.4757 \approx 2.48 \, cm^2

  • Population mean and standard deviation represented as:

    • μ=\mu = population mean wing area,

    • σ=\sigma = population SD of wing area.

  • Sample mean yˉ\bar{y} serves as an estimate for population mean μ\mu and sample SD ss estimates σ\sigma.

Standard Error of the Mean
  • Standard deviation of the sampling distribution of Yˉ\bar{Y} is:

    • σYˉ=σn\sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}

  • Using sample SD, standard error (SE) is:

    • SEYˉ=sn\text{SE}_{\bar{Y}} = \frac{s}{\sqrt{n}}

  • Example Calculation:

    • For n=28n = 28 lambs:

      • s=0.65kgs = 0.65 \, kg

      • yˉ=5.17kg\bar{y} = 5.17 \, kg

    • SE Calculation:

      • SE=0.65280.12kg\text{SE} = \frac{0.65}{\sqrt{28}} \approx 0.12 \, kg

Standard Error vs Standard Deviation
  • Standard Deviation (SD): Measures variability among individual observations in the sample.

  • Standard Error (SE): Measures variability associated with the sample mean as an estimate of the population mean.

Visual Representation of Data
  • Figures represent data displayed as yˉ±SE\bar{y} \pm \text{SE}:

    • Includes bar graphs and interval plots.

  • Importance of clear labeling as SE and SD figures differ significantly.

Confidence Interval for Population Mean μ\mu
  • Given data set:

    • Unknown values of population mean μ\mu and standard deviation σ\sigma.

    • Sample mean yˉ\bar{y} and sample SD ss calculated.

  • Approximation and distribution:

    • From the Central Limit Theorem, Yˉμσ/n\frac{\bar{Y} - \mu}{\sigma/\sqrt{n}} has a N(0,1)N(0, 1) distribution.

    • Use of z-scores for 95% confidence interval:

      • P(-1.96 < \frac{\bar{Y} - \mu}{\sigma/\sqrt{n}} < 1.96) \approx 0.95

  • Calculate 95% CI:

    • Use sample SD ss in place of σ\sigma if normal distribution holds.

Student's t Distribution
  • If data approximates normal distribution:

    • Use Student's t distribution to construct confidence intervals.

  • With increasing degrees of freedom (df), the t-distribution approaches the normal distribution.

Critical Values of t Distribution
  • Use t-tables for determining critical values based on degrees of freedom.

  • Rounding down degrees of freedom may be necessary when values are not listed.

Confidence Interval Calculation for Different Levels
  • Two-sided 95% CI: Use 0.025 column.

  • One-sided confidence intervals: 90% lower (0.10) and upper (0.05) bounds calculated accordingly.

Butterfly Wing Area Example Calculation
  • Given:

    • n=14,yˉ=32.8143,s=2.4757n = 14, \bar{y} = 32.8143, s = 2.4757

  • 95% Confidence Interval Calculation:

    • yˉ±(t0.025×sn)\bar{y} \pm \left(t_{0.025} \times \frac{s}{\sqrt{n}}\right)

      • Result: 32.81±(2.16×2.4814)32.81 \pm \left(2.16 \times \frac{2.48}{\sqrt{14}}\right)

    • Final interval: Approx (31.4,34.2)(31.4, 34.2)

  • 90% Confidence Interval Calculation:

    • yˉ±(t0.05×sn)\bar{y} \pm \left(t_{0.05} \times \frac{s}{\sqrt{n}}\right)

    • Result: Approx (31.6,34)(31.6, 34)

Normality Check and Interpretation
  • Importance of checking normal distribution for valid confidence interval estimation.

  • Confidence interval does not imply population mean is probabilistically bounded but rather is an estimate.

  • If many independent samples taken and confidence intervals calculated, expect that about 95% will capture the true mean.

  • As sample size increases, interval width decreases (diminishing return phenomenon).

One-Sided Confidence Intervals
  • Lower bound calculated as: yˉt0.10SEYˉ\bar{y} - t_{0.10} \text{SE}_{\bar{Y}}.

  • Upper bound calculated as: yˉ+t0.05SEYˉ\bar{y} + t_{0.05} \text{SE}_{\bar{Y}}.

Sample Size Estimation for Precision
  • To estimate necessary sample size for desired SE:

    • E.g., SE=racsnSE = rac{s}{\sqrt{n}} must meet a specific precision threshold.

Diminishing Returns in Data Collection
  • Each additional observation improves accuracy less than previous ones.

Conditions for Validity of Estimation Methods
  • Ensure the sample is a simple random sample to uphold the integrity of statistical inferences.

  • For small samples, data must be approximately normal. Larger samples (typically n=30) invoke Central Limit Theorem.

  • High awareness of possible outliers is required.