Binomial Distribution

The Binomial Distribution

Course Information

  • Course Title: Biostatistics 521: Applied Biostatistics

  • Instructor: Mousumi Banerjee

Introduction to Distributions

  • Normal Distribution:

    • Introduced in the prior session (Tuesday 09/23).

    • A probability model for continuous numerical variables that are unimodal and symmetric.

Key Takeaways from Normal Distribution

  • The Normal distribution provides a good approximation for the distribution of unimodal, symmetric variables.

  • Normal probabilities obtainable using Normal tables or software such as R.

  • Z-score:

    • Definition: The Z-score for a measurement indicates the number of standard deviations a measurement is above or below the mean in the distribution.

    • Purpose: Z-scores allow for (1) easy computation and (2) comparison between measurements from different Normal distributions.

Transition to Binomial Distribution

  • Importance of Binary Categorical Variables:

    • Common in research datasets (case/control, exposed/unexposed, male/female).

    • Binary variables can assume only two possible values which do not fit well with a Normal distribution.

  • Introduction: The Binomial distribution is a probability model specifically for binary data.

Example: COVID-19 Vaccine Breakthrough Infections

  • Assessment of vaccine effectiveness (e.g. Pfizer and Moderna).

  • Hypothesis: Among non-immuno-compromised vaccinated individuals exposed to COVID, it is expected that 1% will become infected.

  • Scenario:

    • Among 1,000 vaccinated individuals exposed to COVID, observe infections.- 8 observed infections: Should this be a cause for concern?

    • Scenarios with 20 or 100 infections posed for consideration.

Bernoulli Trial

  • Definition: A Bernoulli trial is an experiment with only two possible outcomes (e.g. failure/success, heads/tails).

  • Parameter of a Bernoulli Trial: Denoted as p, it represents the probability of success.

  • The complementary likelihood is denoted as q:

    • q = 1 - p (Probability of failure).

  • Any variable that only takes on two possible outcomes can be modeled as a Bernoulli variable.

Examples of Bernoulli Trials

  • Coin Toss: heads (0) or tails (1).

  • Sex of a newborn child: male (0) or female (1).

  • Development of disease: no (0) or yes (1).

  • Clinical trial outcomes: died (0) or lived (1).

  • Note: The determination of which outcome is categorized as success (1) and failure (0) is arbitrary but must remain consistent throughout calculations.

Binomial Model and Distribution

  • Properties:

    • Comprised of n independent Bernoulli trials.

    • The number of trials, n, is predetermined (fixed in advance).

    • Each trial has two outcomes: success (1) with probability p, or failure (0) with probability 1-p.

  • Definition: The number of successes in the n independent trials follows a Binomial distribution.

  • Illustration: Example realization of a binomial experiment with 2 successes and 4 failures in 6 trials.

Conditions for Binomial Distribution

  1. The trials are independent.

  2. The number of trials, n, is fixed.

  3. Each trial outcome can be classified as a success or failure.

  4. The probability of success, p, is constant across all trials.

Evaluating Binomialness: Examples

  • Trial Outcomes:

    • Tossing a fair coin 10 times; recording heads: X = number of heads.

    • Tossing a biased coin with a probability of heads at 0.7 for 10 tosses; X = number of heads.

    • Choosing 13 cards from a deck; X = number of spades drawn.

    • Considering number of girls among the first 100 babies born at UM hospital this year: could include identical twins.

Binomial Distribution Representation

  • Let X denote the number of successes in n trials, with success probability p:

    • X ext{ follows } ext{Binomial}(n,p).

Probability Mass Function (PMF)

  • The probability of obtaining k successes in n trials:

    • P(X = k) = {n race k} p^k (1-p)^{n-k}

    • Here,

    • {n race k} = rac{n!}{k!(n-k)!} (binomial coefficient, number of ways to choose k successes and n-k failures).

Binomial Coefficients Explained

  • The notation {n race k} signifies "n choose k," indicating the number of ways to achieve k successes out of n trials, with no regard for order (combinations).

  • Explanations of factorials:

    • 1! = 1

    • 2! = 2 imes 1 = 2

    • 3! = 3 imes 2 imes 1 = 6

    • By convention, 0! = 1.

Binomial Coefficient Example Calculations

  • One Success in Five Trials:

    • Success can occur in any of the 5 trials, thus yielding 5 arrangements: (10000, 01000, 00100, 00010, 00001).

    • Calculation:

    • {5 race 1} = rac{5!}{1!(5-1)!} = rac{5!}{1! imes 4!} = rac{120}{1 imes 24} = 5 .

  • Two Successes in Five Trials:

    • Arrangements must include 4 failures.

    • Calculation yields 10 distinct ways:

    • {5 race 2} = rac{5!}{2!(5-2)!} = 10.

Binomial Probabilities

  • For a fair coin (p = 0.5) to find the probability of 2 successes (heads) and 3 failures (tails):

    • Calculated as: 10 imes p^2 imes (1-p)^3 = 10 imes 0.5^2 imes 0.5^3 = 0.3125 (31.25%).

  • For a biased coin (p = 0.7) to find the same:

    • 10 imes p^2 imes (1-p)^3 = 10 imes 0.7^2 imes 0.3^3 = 0.1323 (13.23%).

Other Binomial Distributions

  • Graphical representation outlining probabilities for various values of N and differing p: [
    P(X) ext{ plotted against Number of Successes}
    ]

Mean, Variance, and Standard Deviation of Binomial Distribution

  • Mean ( oldsymbol{oldsymbol{ ext{μ}}}):

    • Expected number of successes: ext{μ} = np.

  • Variance ( oldsymbol{oldsymbol{ ext{σ}}^2}):

    • Variability in number of successes measured by: ext{σ}^2 = np(1-p).

  • Standard Deviation ( oldsymbol{oldsymbol{ ext{σ}}}):

    • Square root of variance: ext{σ} = ext{sqrt}(np(1-p)).

Utilizing R for Binomial Probabilities

  • Basic Structure: pbinom(q, size, p, lower.tail=TRUE) computes probabilities for binomial distributions where:

    • k: number of successes.

    • n: number of trials.

    • p: probability of success.

  • Example: Let X ~ Binom(10,0.1).

    • To compute P(X ≤ 2): > pbinom(q=2, size=10, p=0.1) yields 0.9298092.

Complex Probability Scenarios in R

  • For specific probabilities, such as P(X = 2):

    • Uses: P(X = 2) = P(X ≤ 2) - P(X ≤ 1)

    • Implementation in R: pbinom(2, 10, 0.1) – pbinom(1, 10, 0.1).

Example: Cystic Fibrosis Risk Estimation

  • Disease characterization: Cystic Fibrosis is an autosomal recessive condition caused by mutations in the CFTR gene on chromosome 7 (individual with two mutations has CF; one mutation = carrier).

  • Family Analysis: Parents (carriers) x 4 children scenario; expected number with CF, probability assessments for exact successes.

  • Necessary calculations driven by Mendel’s Laws:

    • For n = 4 and occurrence probability p = rac{1}{4},

    • Mean success estimation ext{μ} = np = 1 and probability of exactly one child with CF computed using binomial formula.

Example: COVID Infection Probability

  • Assessed risk and benchmark for infection under COVID vaccination conditions.

  • Calculated probabilities illustrate responses to infection rates exceeding expectations (from expected number) and implications thereof.

Normal Approximation to the Binomial Distribution

  • When conditions: np ext{ and } n(1-p) ext{ are both ≥ 10},

    • The Binomial distribution approximates the Normal distribution closely (mean = np; variance = np(1-p)).

  • Central Limit Theorem: This approximation aligns with concepts discussed regarding distribution convergence.

Conclusion and Key Ideas

  • Importance of binary variables in public health research defined and framed.

  • Binomial distribution encapsulates success probabilities from Bernoulli trials.

  • Central parameters: n (number of trials) and p (probability of success).

  • The mean (np) and variance (np(1-p)) guide expectations in outcomes.

  • With sufficient sample sizes and outcomes, the binomial distribution approaches Normality, reinforcing methodologies in statistical inference processes incorporated in future courses (logistic regression, etc.).