Binomial Distribution
The Binomial Distribution
Course Information
Course Title: Biostatistics 521: Applied Biostatistics
Instructor: Mousumi Banerjee
Introduction to Distributions
Normal Distribution:
Introduced in the prior session (Tuesday 09/23).
A probability model for continuous numerical variables that are unimodal and symmetric.
Key Takeaways from Normal Distribution
The Normal distribution provides a good approximation for the distribution of unimodal, symmetric variables.
Normal probabilities obtainable using Normal tables or software such as R.
Z-score:
Definition: The Z-score for a measurement indicates the number of standard deviations a measurement is above or below the mean in the distribution.
Purpose: Z-scores allow for (1) easy computation and (2) comparison between measurements from different Normal distributions.
Transition to Binomial Distribution
Importance of Binary Categorical Variables:
Common in research datasets (case/control, exposed/unexposed, male/female).
Binary variables can assume only two possible values which do not fit well with a Normal distribution.
Introduction: The Binomial distribution is a probability model specifically for binary data.
Example: COVID-19 Vaccine Breakthrough Infections
Assessment of vaccine effectiveness (e.g. Pfizer and Moderna).
Hypothesis: Among non-immuno-compromised vaccinated individuals exposed to COVID, it is expected that 1% will become infected.
Scenario:
Among 1,000 vaccinated individuals exposed to COVID, observe infections.- 8 observed infections: Should this be a cause for concern?
Scenarios with 20 or 100 infections posed for consideration.
Bernoulli Trial
Definition: A Bernoulli trial is an experiment with only two possible outcomes (e.g. failure/success, heads/tails).
Parameter of a Bernoulli Trial: Denoted as p, it represents the probability of success.
The complementary likelihood is denoted as q:
q = 1 - p (Probability of failure).
Any variable that only takes on two possible outcomes can be modeled as a Bernoulli variable.
Examples of Bernoulli Trials
Coin Toss: heads (0) or tails (1).
Sex of a newborn child: male (0) or female (1).
Development of disease: no (0) or yes (1).
Clinical trial outcomes: died (0) or lived (1).
Note: The determination of which outcome is categorized as success (1) and failure (0) is arbitrary but must remain consistent throughout calculations.
Binomial Model and Distribution
Properties:
Comprised of n independent Bernoulli trials.
The number of trials, n, is predetermined (fixed in advance).
Each trial has two outcomes: success (1) with probability p, or failure (0) with probability 1-p.
Definition: The number of successes in the n independent trials follows a Binomial distribution.
Illustration: Example realization of a binomial experiment with 2 successes and 4 failures in 6 trials.
Conditions for Binomial Distribution
The trials are independent.
The number of trials, n, is fixed.
Each trial outcome can be classified as a success or failure.
The probability of success, p, is constant across all trials.
Evaluating Binomialness: Examples
Trial Outcomes:
Tossing a fair coin 10 times; recording heads: X = number of heads.
Tossing a biased coin with a probability of heads at 0.7 for 10 tosses; X = number of heads.
Choosing 13 cards from a deck; X = number of spades drawn.
Considering number of girls among the first 100 babies born at UM hospital this year: could include identical twins.
Binomial Distribution Representation
Let X denote the number of successes in n trials, with success probability p:
X ext{ follows } ext{Binomial}(n,p).
Probability Mass Function (PMF)
The probability of obtaining k successes in n trials:
P(X = k) = {n race k} p^k (1-p)^{n-k}
Here,
{n race k} = rac{n!}{k!(n-k)!} (binomial coefficient, number of ways to choose k successes and n-k failures).
Binomial Coefficients Explained
The notation {n race k} signifies "n choose k," indicating the number of ways to achieve k successes out of n trials, with no regard for order (combinations).
Explanations of factorials:
1! = 1
2! = 2 imes 1 = 2
3! = 3 imes 2 imes 1 = 6
By convention, 0! = 1.
Binomial Coefficient Example Calculations
One Success in Five Trials:
Success can occur in any of the 5 trials, thus yielding 5 arrangements: (10000, 01000, 00100, 00010, 00001).
Calculation:
{5 race 1} = rac{5!}{1!(5-1)!} = rac{5!}{1! imes 4!} = rac{120}{1 imes 24} = 5 .
Two Successes in Five Trials:
Arrangements must include 4 failures.
Calculation yields 10 distinct ways:
{5 race 2} = rac{5!}{2!(5-2)!} = 10.
Binomial Probabilities
For a fair coin (p = 0.5) to find the probability of 2 successes (heads) and 3 failures (tails):
Calculated as: 10 imes p^2 imes (1-p)^3 = 10 imes 0.5^2 imes 0.5^3 = 0.3125 (31.25%).
For a biased coin (p = 0.7) to find the same:
10 imes p^2 imes (1-p)^3 = 10 imes 0.7^2 imes 0.3^3 = 0.1323 (13.23%).
Other Binomial Distributions
Graphical representation outlining probabilities for various values of N and differing p: [
P(X) ext{ plotted against Number of Successes}
]
Mean, Variance, and Standard Deviation of Binomial Distribution
Mean ( oldsymbol{oldsymbol{ ext{μ}}}):
Expected number of successes: ext{μ} = np.
Variance ( oldsymbol{oldsymbol{ ext{σ}}^2}):
Variability in number of successes measured by: ext{σ}^2 = np(1-p).
Standard Deviation ( oldsymbol{oldsymbol{ ext{σ}}}):
Square root of variance: ext{σ} = ext{sqrt}(np(1-p)).
Utilizing R for Binomial Probabilities
Basic Structure:
pbinom(q, size, p, lower.tail=TRUE)computes probabilities for binomial distributions where:k: number of successes.
n: number of trials.
p: probability of success.
Example: Let X ~ Binom(10,0.1).
To compute P(X ≤ 2):
> pbinom(q=2, size=10, p=0.1)yields 0.9298092.
Complex Probability Scenarios in R
For specific probabilities, such as P(X = 2):
Uses: P(X = 2) = P(X ≤ 2) - P(X ≤ 1)
Implementation in R:
pbinom(2, 10, 0.1) – pbinom(1, 10, 0.1).
Example: Cystic Fibrosis Risk Estimation
Disease characterization: Cystic Fibrosis is an autosomal recessive condition caused by mutations in the CFTR gene on chromosome 7 (individual with two mutations has CF; one mutation = carrier).
Family Analysis: Parents (carriers) x 4 children scenario; expected number with CF, probability assessments for exact successes.
Necessary calculations driven by Mendel’s Laws:
For n = 4 and occurrence probability p = rac{1}{4},
Mean success estimation ext{μ} = np = 1 and probability of exactly one child with CF computed using binomial formula.
Example: COVID Infection Probability
Assessed risk and benchmark for infection under COVID vaccination conditions.
Calculated probabilities illustrate responses to infection rates exceeding expectations (from expected number) and implications thereof.
Normal Approximation to the Binomial Distribution
When conditions: np ext{ and } n(1-p) ext{ are both ≥ 10},
The Binomial distribution approximates the Normal distribution closely (mean = np; variance = np(1-p)).
Central Limit Theorem: This approximation aligns with concepts discussed regarding distribution convergence.
Conclusion and Key Ideas
Importance of binary variables in public health research defined and framed.
Binomial distribution encapsulates success probabilities from Bernoulli trials.
Central parameters: n (number of trials) and p (probability of success).
The mean (np) and variance (np(1-p)) guide expectations in outcomes.
With sufficient sample sizes and outcomes, the binomial distribution approaches Normality, reinforcing methodologies in statistical inference processes incorporated in future courses (logistic regression, etc.).