QBIO 305 Statistics for the Life Sciences - Study Notes
Meta-Study
Definition: A meta-study (or meta-analysis) combines results from multiple independent studies.
Components:
Involves many repetitions or replications of the same study.
If a study consists of drawing a random sample of size n from a population:
The meta-study involves drawing repeated random samples of size n from the same population.
Goals:
Increase sample size.
Reduce random noise.
Obtain a more reliable overall conclusion.
Sampling Variability
Definition: When repeatedly taking random samples from the same population:
Each sample will yield slightly different results.
Sample means and sample proportions vary from sample to sample.
Nature of Sampling Variability:
Exists even when there is no measurement error.
Present when a study is conducted correctly.
Sampling Distribution
Definition: Describes how a statistic behaves.
Procedure:
Take a random sample.
Compute a statistic (e.g., mean, proportion).
Repeat many times.
Examples:
Distribution of sample means, denoted as ar{X}.
Distribution of sample proportions, denoted as ildep.
Key Theorems: LLN and CLT
Focus on two key theorems in statistics:
Law of Large Numbers (LLN): As the sample size n increases, the sample mean ar{X}_n converges to the population mean extµ.
Central Limit Theorem (CLT): For large sample sizes, the sum (or average) of independent random variables will tend to follow a normal distribution regardless of the original distribution of the variables.
Simple Random Sample
Definition: Each member of the population has an equal chance of being included in the sample.
Characteristics:
Sample members are chosen independently of one another.
i.i.d Distribution
Definition: In a simple random sample, elements are said to be independent and identically distributed (i.i.d), meaning:
They come from the same probability distribution.
Law of Large Numbers (LLN)
Notation: Let X<em>1,X</em>2,…,Xn be independent and identically distributed random variables, each with mean extµ.
Statement: As the sample size n approaches infinity:
Interpretation: The sample mean converges to the population mean.
Intuitive Understanding
Conceptual Understanding: As sample size becomes arbitrarily large, the sample mean becomes arbitrarily close to the population mean.
Coin Flipping Example
Experiment: Flipping a fair quarter 100 times to determine the expected number of heads.
Distribution of Results:
Let: Xi=1 with probability 21 (Heads).
Let: Xi=0 with probability 21 (Tails).
Calculated Mean: extµ=E[Xi]=1⋅21+0⋅21=0.5.
Conclusion: The Law of Large Numbers indicates that the sample mean will converge to 0.5.
Bernoulli Trials Example
Consideration: Flipping a coin, a random variable Xi results in 1 (success) or 0 (failure).
Mean: E[Xi]=p.
LLN: Sample means will converge to the probability of success p.
Expected Value Interpretation
Frequentist Interpretation: The probability of an event occurring with probability p is interpreted as the frequency of that event happening in many independent trials.
Rolling a Die Example
Random Variable: Xi representing die results.
Mean Calculation:
E[Xi]=61+2+3+4+5+6=3.5.
LLN Application: Sample means will converge to 3.5 as sample size increases.
Central Limit Theorem (CLT)
Definition: Let X<em>1,X</em>2,…,Xn be independent and identically distributed random variables with mean µ and standard deviation σ.
Asymptotic Behavior: For large n:
σn∑<em>i=1n(X</em>i−µ)→N(0,1) (standard normal distribution).
Convergence of Random Variables
Interpretation: The distribution functions of the two random variables converge, establishing a link between sample distributions and normal distributions.
Graphical Representation of Sampling Distribution
Illustration: Based on means from sampling sizes (n=4, n=7, n=10), with a consistent trend showing convergence to normality as n increases.
Galton Board Example
Summary: The final position of a ball in a Galton board represents the sum of independent random movements (approximation to normal distribution due to the Central Limit Theorem).
Normal Distribution Properties
Property: For a normal random variable Z with mean µ and standard deviation σ, the transformed random variable aZ+b is also normal with:
Mean: aµ+b.
Standard Deviation: ∣a∣σ.
Behavior of Summed Random Variables
For large sample sizes: n∑<em>i=1nX</em>i→N(nµ,σ) for sum of independent random variables.
Conclusion: The sum of many independent variables leads to a normal distribution, explaining phenomena such as height variation in populations.
Population Height Distribution
Graphical Data: U.S. Height Distribution for men and women illustrating normal distribution characteristics among large populations.
Blood Pressure Example
Frequency Distribution: Diastolic blood pressure data shown in a histogram format indicating overall population normality and behavior.
Total Cholesterol Distribution Example
Utilizes statistical analysis for age ≤ 20 demonstrating stable distribution with various statistics reported.
Application of the Central Limit Theorem in Real-World Scenarios
Baseball Batting Average: Demonstrates averaging outcomes of numerous independent, binomially distributed at-bats, yielding normal distribution characteristics for sample means.
Implications of Sample Size on Distribution
Clarification: Sample size influences the average closer to normality but does not alter the underlying population distribution itself.
Common Misconception: Increasing sample size does not convert non-normal distributions to normal. Instead, only the distribution of sample means becomes normal due to CLT principles.
Roulette Game Example
Game Structure: Standard American roulette outlined, including expected gains correlating to the casino's advantages and expected negative returns for players over time.
Statistical Expectations: In the long run, players will experience negative profits due to statistical normality.
Binomial Distributions and Normal Approximation
Binomial Definition: Total successes in n independent trials, each with probability p.
Normal Approximation: For large n:
∑<em>i=1nX</em>i→N(np,np(1−p)).
Albino Example Summary
Random Variables: Probabilities calculated using binomial distribution principles.
E.g., carriers of the albino gene producing children, applying normal approximation for large sample sizes to simplify computations.
Continuity Correction in Approximations
Necessity: Correction employed when approximating a discrete binomial distribution with continuous normal distribution for more accurate probability assessments.
Summary of Findings
Sampling Distribution Properties:
Mean of sampling distribution MY=µ.
Standard deviation relationship: σY=nσ.
Distribution shape:
If population distribution is normal, sampling distribution is normal (any sample size).
If n is large, sampling distribution of Y is approximately normal regardless of underlying population distribution.