Chapter 5 Notes

QBIO 305 Statistics for the Life Sciences - Study Notes

Meta-Study

Definition: A meta-study (or meta-analysis) combines results from multiple independent studies.
Components:
- Involves many repetitions or replications of the same study.
- If a study consists of drawing a random sample of size n from a population:
- The meta-study involves drawing repeated random samples of size n from the same population.
Goals:
- Increase sample size.
- Reduce random noise.
- Obtain a more reliable overall conclusion.

Sampling Variability

Definition: When repeatedly taking random samples from the same population:
- Each sample will yield slightly different results.
- Sample means and sample proportions vary from sample to sample.
Nature of Sampling Variability:
- Exists even when there is no measurement error.
- Present when a study is conducted correctly.

Sampling Distribution

Definition: Describes how a statistic behaves.
Procedure:
1. Take a random sample.
2. Compute a statistic (e.g., mean, proportion).
3. Repeat many times.
Examples:
- Distribution of sample means, denoted as ar{X}.
- Distribution of sample proportions, denoted as $ilde{p}$ .

Key Theorems: LLN and CLT

Focus on two key theorems in statistics:
- Law of Large Numbers (LLN): As the sample size n increases, the sample mean ar{X}_n converges to the population mean $ext{µ}$ .
- Central Limit Theorem (CLT): For large sample sizes, the sum (or average) of independent random variables will tend to follow a normal distribution regardless of the original distribution of the variables.

Simple Random Sample

Definition: Each member of the population has an equal chance of being included in the sample.
Characteristics:
- Sample members are chosen independently of one another.

i.i.d Distribution

Definition: In a simple random sample, elements are said to be independent and identically distributed (i.i.d), meaning:
- They come from the same probability distribution.

Law of Large Numbers (LLN)

Notation: Let $X1, X2,…, X_n$ be independent and identically distributed random variables, each with mean $ext{µ}$ .
Statement: As the sample size $n$ approaches infinity:
- ar{X}n = \frac{\sum{i=1}^{n} X_i}{n} \rightarrow \text{µ}.
Interpretation: The sample mean converges to the population mean.

Intuitive Understanding

Conceptual Understanding: As sample size becomes arbitrarily large, the sample mean becomes arbitrarily close to the population mean.

Coin Flipping Example

Experiment: Flipping a fair quarter 100 times to determine the expected number of heads.
Distribution of Results:
- Let: $X_i = 1$ with probability $\frac{1}{2}$ (Heads).
- Let: $X_i = 0$ with probability $\frac{1}{2}$ (Tails).
Calculated Mean: $ext{µ} = E[X_i] = 1 \cdot \frac{1}{2} + 0 \cdot \frac{1}{2} = 0.5$ .
Conclusion: The Law of Large Numbers indicates that the sample mean will converge to $0.5$ .

Bernoulli Trials Example

Consideration: Flipping a coin, a random variable $X_i$ results in 1 (success) or 0 (failure).
Mean: $E[X_i] = p$ .
LLN: Sample means will converge to the probability of success p.

Expected Value Interpretation

Frequentist Interpretation: The probability of an event occurring with probability $p$ is interpreted as the frequency of that event happening in many independent trials.

Rolling a Die Example

Random Variable: $X_i$ representing die results.
Mean Calculation:
- $E[X_i] = \frac{1+2+3+4+5+6}{6} = 3.5$ .
LLN Application: Sample means will converge to 3.5 as sample size increases.

Central Limit Theorem (CLT)

Definition: Let $X1, X2,…, X_n$ be independent and identically distributed random variables with mean $µ$ and standard deviation $σ$ .
Asymptotic Behavior: For large n:
- $\frac{\sum{i=1}^{n}(Xi - \text{µ})}{\sigma \sqrt{n}} \rightarrow N(0, 1)$ (standard normal distribution).

Convergence of Random Variables

Interpretation: The distribution functions of the two random variables converge, establishing a link between sample distributions and normal distributions.

Graphical Representation of Sampling Distribution

Illustration: Based on means from sampling sizes (n=4, n=7, n=10), with a consistent trend showing convergence to normality as n increases.

Galton Board Example

Summary: The final position of a ball in a Galton board represents the sum of independent random movements (approximation to normal distribution due to the Central Limit Theorem).

Normal Distribution Properties

Property: For a normal random variable $Z$ with mean $µ$ and standard deviation $σ$ , the transformed random variable $aZ + b$ is also normal with:
- Mean: $aµ + b$ .
- Standard Deviation: $|a|σ$ .

Behavior of Summed Random Variables

For large sample sizes: $\frac{\sum{i=1}^{n}Xi}{n} \to N(\text{nµ}, \text{σ})$ for sum of independent random variables.
Conclusion: The sum of many independent variables leads to a normal distribution, explaining phenomena such as height variation in populations.

Population Height Distribution

Graphical Data: U.S. Height Distribution for men and women illustrating normal distribution characteristics among large populations.

Blood Pressure Example

Frequency Distribution: Diastolic blood pressure data shown in a histogram format indicating overall population normality and behavior.

Total Cholesterol Distribution Example

Utilizes statistical analysis for age ≤ 20 demonstrating stable distribution with various statistics reported.

Application of the Central Limit Theorem in Real-World Scenarios

Baseball Batting Average: Demonstrates averaging outcomes of numerous independent, binomially distributed at-bats, yielding normal distribution characteristics for sample means.

Implications of Sample Size on Distribution

Clarification: Sample size influences the average closer to normality but does not alter the underlying population distribution itself.
Common Misconception: Increasing sample size does not convert non-normal distributions to normal. Instead, only the distribution of sample means becomes normal due to CLT principles.

Roulette Game Example

Game Structure: Standard American roulette outlined, including expected gains correlating to the casino's advantages and expected negative returns for players over time.
Statistical Expectations: In the long run, players will experience negative profits due to statistical normality.

Binomial Distributions and Normal Approximation

Binomial Definition: Total successes in $n$ independent trials, each with probability $p$ .
Normal Approximation: For large n:
- $\sum{i=1}^{n}Xi \rightarrow N(np, \sqrt{np(1-p)})$ .

Albino Example Summary

Random Variables: Probabilities calculated using binomial distribution principles.
- E.g., carriers of the albino gene producing children, applying normal approximation for large sample sizes to simplify computations.

Continuity Correction in Approximations

Necessity: Correction employed when approximating a discrete binomial distribution with continuous normal distribution for more accurate probability assessments.

Summary of Findings

Sampling Distribution Properties:
- Mean of sampling distribution $M_Y = µ$ .
- Standard deviation relationship: $σ_Y = \frac{σ}{\sqrt{n}}$ .
- Distribution shape:
  - If population distribution is normal, sampling distribution is normal (any sample size).
  - If $n$ is large, sampling distribution of $Y$ is approximately normal regardless of underlying population distribution.