QBIO 305 Statistics for the Life Sciences

QBIO 305 Statistics for the Life Sciences Notes

Professor Zhengye Zhou

Central Limit Theorem

Let $Y_i$ be independent and identically distributed random variables, each with mean $\mu$ and standard deviation $\sigma$ .
For large $n$ :
- Mean: $\bar{Y}_n = \frac{Y_1 + \cdots + Y_n}{n}$
- Approximation: $\bar{Y}_n \approx \mu + \frac{\sigma}{\sqrt{n}} N(0, 1)$
- Where $N(0, 1)$ is a standard normal random variable.
The Central Limit Theorem provides information about the average of these random variables.

Application of Central Limit Theorem

Example: Weight of lambs is viewed as a random variable where:
- Population mean $\mu$ and population standard deviation $\sigma$ are unknown.
- Mean weight can be calculated from the sample, allowing estimation of the population parameters.

Statistical Estimation

Case Study: Researchers captured 14 male Monarch butterflies and measured wing area:
- Sample Mean: $\bar{y} = 32.8143 \approx 32.81 \, cm^2$
- Sample Standard Deviation: $s = 2.4757 \approx 2.48 \, cm^2$
Population mean and standard deviation represented as:
- $\mu =$ population mean wing area,
- $\sigma =$ population SD of wing area.
Sample mean $\bar{y}$ serves as an estimate for population mean $\mu$ and sample SD $s$ estimates $\sigma$ .

Standard Error of the Mean

Standard deviation of the sampling distribution of $\bar{Y}$ is:
- $\sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}$
Using sample SD, standard error (SE) is:
- $\text{SE}_{\bar{Y}} = \frac{s}{\sqrt{n}}$
Example Calculation:
- For $n = 28$ lambs:
  - $s = 0.65 \, kg$
  - $\bar{y} = 5.17 \, kg$
- SE Calculation:
  - $\text{SE} = \frac{0.65}{\sqrt{28}} \approx 0.12 \, kg$

Standard Error vs Standard Deviation

Standard Deviation (SD): Measures variability among individual observations in the sample.
Standard Error (SE): Measures variability associated with the sample mean as an estimate of the population mean.

Visual Representation of Data

Figures represent data displayed as $\bar{y} \pm \text{SE}$ :
- Includes bar graphs and interval plots.
Importance of clear labeling as SE and SD figures differ significantly.

Confidence Interval for Population Mean $\mu$

Given data set:
- Unknown values of population mean $\mu$ and standard deviation $\sigma$ .
- Sample mean $\bar{y}$ and sample SD $s$ calculated.
Approximation and distribution:
- From the Central Limit Theorem, $\frac{\bar{Y} - \mu}{\sigma/\sqrt{n}}$ has a $N(0, 1)$ distribution.
- Use of z-scores for 95% confidence interval:
  - P(-1.96 < \frac{\bar{Y} - \mu}{\sigma/\sqrt{n}} < 1.96) \approx 0.95
Calculate 95% CI:
- Use sample SD $s$ in place of $\sigma$ if normal distribution holds.

Student's t Distribution

If data approximates normal distribution:
- Use Student's t distribution to construct confidence intervals.
With increasing degrees of freedom (df), the t-distribution approaches the normal distribution.

Critical Values of t Distribution

Use t-tables for determining critical values based on degrees of freedom.
Rounding down degrees of freedom may be necessary when values are not listed.

Confidence Interval Calculation for Different Levels

Two-sided 95% CI: Use 0.025 column.
One-sided confidence intervals: 90% lower (0.10) and upper (0.05) bounds calculated accordingly.

Butterfly Wing Area Example Calculation

Given:
- $n = 14, \bar{y} = 32.8143, s = 2.4757$
95% Confidence Interval Calculation:
- $\bar{y} \pm \left(t_{0.025} \times \frac{s}{\sqrt{n}}\right)$
  - Result: $32.81 \pm \left(2.16 \times \frac{2.48}{\sqrt{14}}\right)$
- Final interval: Approx $(31.4, 34.2)$
90% Confidence Interval Calculation:
- $\bar{y} \pm \left(t_{0.05} \times \frac{s}{\sqrt{n}}\right)$
- Result: Approx $(31.6, 34)$

Normality Check and Interpretation

Importance of checking normal distribution for valid confidence interval estimation.
Confidence interval does not imply population mean is probabilistically bounded but rather is an estimate.
If many independent samples taken and confidence intervals calculated, expect that about 95% will capture the true mean.
As sample size increases, interval width decreases (diminishing return phenomenon).

One-Sided Confidence Intervals

Lower bound calculated as: $\bar{y} - t_{0.10} \text{SE}_{\bar{Y}}$ .
Upper bound calculated as: $\bar{y} + t_{0.05} \text{SE}_{\bar{Y}}$ .

Sample Size Estimation for Precision

To estimate necessary sample size for desired SE:
- E.g., $SE = rac{s}{\sqrt{n}}$ must meet a specific precision threshold.

Diminishing Returns in Data Collection

Each additional observation improves accuracy less than previous ones.

Conditions for Validity of Estimation Methods

Ensure the sample is a simple random sample to uphold the integrity of statistical inferences.
For small samples, data must be approximately normal. Larger samples (typically n=30) invoke Central Limit Theorem.
High awareness of possible outliers is required.

QBIO 305 Statistics for the Life Sciences

QBIO 305 Statistics for the Life Sciences Notes

Professor Zhengye Zhou

Central Limit Theorem

Application of Central Limit Theorem

Statistical Estimation

Standard Error of the Mean

Standard Error vs Standard Deviation

Visual Representation of Data

Confidence Interval for Population Mean μ\muμ

Student's t Distribution

Critical Values of t Distribution

Confidence Interval Calculation for Different Levels

Butterfly Wing Area Example Calculation

Normality Check and Interpretation

One-Sided Confidence Intervals

Sample Size Estimation for Precision

Diminishing Returns in Data Collection

Conditions for Validity of Estimation Methods

Confidence Interval for Population Mean $\mu$