Chapter 5 Notes

QBIO 305 Statistics for the Life Sciences - Study Notes

Meta-Study

  • Definition: A meta-study (or meta-analysis) combines results from multiple independent studies.
  • Components:
    • Involves many repetitions or replications of the same study.
    • If a study consists of drawing a random sample of size n from a population:
    • The meta-study involves drawing repeated random samples of size n from the same population.
  • Goals:
    • Increase sample size.
    • Reduce random noise.
    • Obtain a more reliable overall conclusion.

Sampling Variability

  • Definition: When repeatedly taking random samples from the same population:
    • Each sample will yield slightly different results.
    • Sample means and sample proportions vary from sample to sample.
  • Nature of Sampling Variability:
    • Exists even when there is no measurement error.
    • Present when a study is conducted correctly.

Sampling Distribution

  • Definition: Describes how a statistic behaves.
  • Procedure:
    1. Take a random sample.
    2. Compute a statistic (e.g., mean, proportion).
    3. Repeat many times.
  • Examples:
    • Distribution of sample means, denoted as ar{X}.
    • Distribution of sample proportions, denoted as ildepilde{p}.

Key Theorems: LLN and CLT

  • Focus on two key theorems in statistics:
    • Law of Large Numbers (LLN): As the sample size n increases, the sample mean ar{X}_n converges to the population mean extµext{µ}.
    • Central Limit Theorem (CLT): For large sample sizes, the sum (or average) of independent random variables will tend to follow a normal distribution regardless of the original distribution of the variables.
Simple Random Sample
  • Definition: Each member of the population has an equal chance of being included in the sample.
  • Characteristics:
    • Sample members are chosen independently of one another.
i.i.d Distribution
  • Definition: In a simple random sample, elements are said to be independent and identically distributed (i.i.d), meaning:
    • They come from the same probability distribution.
Law of Large Numbers (LLN)
  • Notation: Let X<em>1,X</em>2,,XnX<em>1, X</em>2,…, X_n be independent and identically distributed random variables, each with mean extµext{µ}.
  • Statement: As the sample size nn approaches infinity:
    • ar{X}n = \frac{\sum{i=1}^{n} X_i}{n} \rightarrow \text{µ}.
  • Interpretation: The sample mean converges to the population mean.
Intuitive Understanding
  • Conceptual Understanding: As sample size becomes arbitrarily large, the sample mean becomes arbitrarily close to the population mean.
Coin Flipping Example
  • Experiment: Flipping a fair quarter 100 times to determine the expected number of heads.
  • Distribution of Results:
    • Let: Xi=1X_i = 1 with probability 12\frac{1}{2} (Heads).
    • Let: Xi=0X_i = 0 with probability 12\frac{1}{2} (Tails).
  • Calculated Mean: extµ=E[Xi]=112+012=0.5ext{µ} = E[X_i] = 1 \cdot \frac{1}{2} + 0 \cdot \frac{1}{2} = 0.5.
  • Conclusion: The Law of Large Numbers indicates that the sample mean will converge to 0.50.5.
Bernoulli Trials Example
  • Consideration: Flipping a coin, a random variable XiX_i results in 1 (success) or 0 (failure).
  • Mean: E[Xi]=pE[X_i] = p.
  • LLN: Sample means will converge to the probability of success p.
Expected Value Interpretation
  • Frequentist Interpretation: The probability of an event occurring with probability pp is interpreted as the frequency of that event happening in many independent trials.
Rolling a Die Example
  • Random Variable: XiX_i representing die results.
  • Mean Calculation:
    • E[Xi]=1+2+3+4+5+66=3.5E[X_i] = \frac{1+2+3+4+5+6}{6} = 3.5.
  • LLN Application: Sample means will converge to 3.5 as sample size increases.

Central Limit Theorem (CLT)

  • Definition: Let X<em>1,X</em>2,,XnX<em>1, X</em>2,…, X_n be independent and identically distributed random variables with mean µµ and standard deviation σσ.
  • Asymptotic Behavior: For large n:
    • <em>i=1n(X</em>iµ)σnN(0,1)\frac{\sum<em>{i=1}^{n}(X</em>i - \text{µ})}{\sigma \sqrt{n}} \rightarrow N(0, 1) (standard normal distribution).
Convergence of Random Variables
  • Interpretation: The distribution functions of the two random variables converge, establishing a link between sample distributions and normal distributions.

Graphical Representation of Sampling Distribution

  • Illustration: Based on means from sampling sizes (n=4, n=7, n=10), with a consistent trend showing convergence to normality as n increases.

Galton Board Example

  • Summary: The final position of a ball in a Galton board represents the sum of independent random movements (approximation to normal distribution due to the Central Limit Theorem).

Normal Distribution Properties

  • Property: For a normal random variable ZZ with mean µµ and standard deviation σσ, the transformed random variable aZ+baZ + b is also normal with:
    • Mean: aµ+baµ + b.
    • Standard Deviation: aσ|a|σ.
Behavior of Summed Random Variables
  • For large sample sizes: <em>i=1nX</em>inN(,σ)\frac{\sum<em>{i=1}^{n}X</em>i}{n} \to N(\text{nµ}, \text{σ}) for sum of independent random variables.
  • Conclusion: The sum of many independent variables leads to a normal distribution, explaining phenomena such as height variation in populations.

Population Height Distribution

  • Graphical Data: U.S. Height Distribution for men and women illustrating normal distribution characteristics among large populations.

Blood Pressure Example

  • Frequency Distribution: Diastolic blood pressure data shown in a histogram format indicating overall population normality and behavior.

Total Cholesterol Distribution Example

  • Utilizes statistical analysis for age ≤ 20 demonstrating stable distribution with various statistics reported.

Application of the Central Limit Theorem in Real-World Scenarios

  • Baseball Batting Average: Demonstrates averaging outcomes of numerous independent, binomially distributed at-bats, yielding normal distribution characteristics for sample means.

Implications of Sample Size on Distribution

  • Clarification: Sample size influences the average closer to normality but does not alter the underlying population distribution itself.
  • Common Misconception: Increasing sample size does not convert non-normal distributions to normal. Instead, only the distribution of sample means becomes normal due to CLT principles.

Roulette Game Example

  • Game Structure: Standard American roulette outlined, including expected gains correlating to the casino's advantages and expected negative returns for players over time.
  • Statistical Expectations: In the long run, players will experience negative profits due to statistical normality.

Binomial Distributions and Normal Approximation

  • Binomial Definition: Total successes in nn independent trials, each with probability pp.
  • Normal Approximation: For large n:
    • <em>i=1nX</em>iN(np,np(1p))\sum<em>{i=1}^{n}X</em>i \rightarrow N(np, \sqrt{np(1-p)}).

Albino Example Summary

  • Random Variables: Probabilities calculated using binomial distribution principles.
    • E.g., carriers of the albino gene producing children, applying normal approximation for large sample sizes to simplify computations.

Continuity Correction in Approximations

  • Necessity: Correction employed when approximating a discrete binomial distribution with continuous normal distribution for more accurate probability assessments.

Summary of Findings

  1. Sampling Distribution Properties:
    • Mean of sampling distribution MY=µM_Y = µ.
    • Standard deviation relationship: σY=σnσ_Y = \frac{σ}{\sqrt{n}}.
    • Distribution shape:
      • If population distribution is normal, sampling distribution is normal (any sample size).
      • If nn is large, sampling distribution of YY is approximately normal regardless of underlying population distribution.