QBIO 305 Statistics for the Life Sciences
QBIO 305 Statistics for the Life Sciences Notes
Professor Zhengye Zhou
Central Limit Theorem
Let Y_i be independent and identically distributed random variables, each with mean \mu and standard deviation \sigma.
For large n:
Mean: \bar{Y}_n = \frac{Y_1 + \cdots + Y_n}{n}
Approximation: \bar{Y}_n \approx \mu + \frac{\sigma}{\sqrt{n}} N(0, 1)
Where N(0, 1) is a standard normal random variable.
The Central Limit Theorem provides information about the average of these random variables.
Application of Central Limit Theorem
Example: Weight of lambs is viewed as a random variable where:
Population mean \mu and population standard deviation \sigma are unknown.
Mean weight can be calculated from the sample, allowing estimation of the population parameters.
Statistical Estimation
Case Study: Researchers captured 14 male Monarch butterflies and measured wing area:
Sample Mean: \bar{y} = 32.8143 \approx 32.81 \, cm^2
Sample Standard Deviation: s = 2.4757 \approx 2.48 \, cm^2
Population mean and standard deviation represented as:
\mu = population mean wing area,
\sigma = population SD of wing area.
Sample mean \bar{y} serves as an estimate for population mean \mu and sample SD s estimates \sigma.
Standard Error of the Mean
Standard deviation of the sampling distribution of \bar{Y} is:
\sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}
Using sample SD, standard error (SE) is:
\text{SE}_{\bar{Y}} = \frac{s}{\sqrt{n}}
Example Calculation:
For n = 28 lambs:
s = 0.65 \, kg
\bar{y} = 5.17 \, kg
SE Calculation:
\text{SE} = \frac{0.65}{\sqrt{28}} \approx 0.12 \, kg
Standard Error vs Standard Deviation
Standard Deviation (SD): Measures variability among individual observations in the sample.
Standard Error (SE): Measures variability associated with the sample mean as an estimate of the population mean.
Visual Representation of Data
Figures represent data displayed as \bar{y} \pm \text{SE}:
Includes bar graphs and interval plots.
Importance of clear labeling as SE and SD figures differ significantly.
Confidence Interval for Population Mean \mu
Given data set:
Unknown values of population mean \mu and standard deviation \sigma.
Sample mean \bar{y} and sample SD s calculated.
Approximation and distribution:
From the Central Limit Theorem, \frac{\bar{Y} - \mu}{\sigma/\sqrt{n}} has a N(0, 1) distribution.
Use of z-scores for 95% confidence interval:
P(-1.96 < \frac{\bar{Y} - \mu}{\sigma/\sqrt{n}} < 1.96) \approx 0.95
Calculate 95% CI:
Use sample SD s in place of \sigma if normal distribution holds.
Student's t Distribution
If data approximates normal distribution:
Use Student's t distribution to construct confidence intervals.
With increasing degrees of freedom (df), the t-distribution approaches the normal distribution.
Critical Values of t Distribution
Use t-tables for determining critical values based on degrees of freedom.
Rounding down degrees of freedom may be necessary when values are not listed.
Confidence Interval Calculation for Different Levels
Two-sided 95% CI: Use 0.025 column.
One-sided confidence intervals: 90% lower (0.10) and upper (0.05) bounds calculated accordingly.
Butterfly Wing Area Example Calculation
Given:
n = 14, \bar{y} = 32.8143, s = 2.4757
95% Confidence Interval Calculation:
\bar{y} \pm \left(t_{0.025} \times \frac{s}{\sqrt{n}}\right)
Result: 32.81 \pm \left(2.16 \times \frac{2.48}{\sqrt{14}}\right)
Final interval: Approx (31.4, 34.2)
90% Confidence Interval Calculation:
\bar{y} \pm \left(t_{0.05} \times \frac{s}{\sqrt{n}}\right)
Result: Approx (31.6, 34)
Normality Check and Interpretation
Importance of checking normal distribution for valid confidence interval estimation.
Confidence interval does not imply population mean is probabilistically bounded but rather is an estimate.
If many independent samples taken and confidence intervals calculated, expect that about 95% will capture the true mean.
As sample size increases, interval width decreases (diminishing return phenomenon).
One-Sided Confidence Intervals
Lower bound calculated as: \bar{y} - t_{0.10} \text{SE}_{\bar{Y}}.
Upper bound calculated as: \bar{y} + t_{0.05} \text{SE}_{\bar{Y}}.
Sample Size Estimation for Precision
To estimate necessary sample size for desired SE:
E.g., SE = rac{s}{\sqrt{n}} must meet a specific precision threshold.
Diminishing Returns in Data Collection
Each additional observation improves accuracy less than previous ones.
Conditions for Validity of Estimation Methods
Ensure the sample is a simple random sample to uphold the integrity of statistical inferences.
For small samples, data must be approximately normal. Larger samples (typically n=30) invoke Central Limit Theorem.
High awareness of possible outliers is required.