Binomial Distribution

Normal Distribution:
- Introduced in the prior session (Tuesday 09/23).
- A probability model for continuous numerical variables that are unimodal and symmetric.

The Normal distribution provides a good approximation for the distribution of unimodal, symmetric variables.
Normal probabilities obtainable using Normal tables or software such as R.
Z-score:
- Definition: The Z-score for a measurement indicates the number of standard deviations a measurement is above or below the mean in the distribution.
- Purpose: Z-scores allow for (1) easy computation and (2) comparison between measurements from different Normal distributions.

Importance of Binary Categorical Variables:
- Common in research datasets (case/control, exposed/unexposed, male/female).
- Binary variables can assume only two possible values which do not fit well with a Normal distribution.
Introduction: The Binomial distribution is a probability model specifically for binary data.

Assessment of vaccine effectiveness (e.g. Pfizer and Moderna).
Hypothesis: Among non-immuno-compromised vaccinated individuals exposed to COVID, it is expected that 1% will become infected.
Scenario:
- Among 1,000 vaccinated individuals exposed to COVID, observe infections.- 8 observed infections: Should this be a cause for concern?
- Scenarios with 20 or 100 infections posed for consideration.

Definition: A Bernoulli trial is an experiment with only two possible outcomes (e.g. failure/success, heads/tails).
Parameter of a Bernoulli Trial: Denoted as p, it represents the probability of success.
The complementary likelihood is denoted as q:
- $q = 1 - p$ (Probability of failure).
Any variable that only takes on two possible outcomes can be modeled as a Bernoulli variable.

Coin Toss: heads (0) or tails (1).
Sex of a newborn child: male (0) or female (1).
Development of disease: no (0) or yes (1).
Clinical trial outcomes: died (0) or lived (1).
Note: The determination of which outcome is categorized as success (1) and failure (0) is arbitrary but must remain consistent throughout calculations.

Properties:
- Comprised of n independent Bernoulli trials.
- The number of trials, n, is predetermined (fixed in advance).
- Each trial has two outcomes: success (1) with probability p, or failure (0) with probability 1-p.
Definition: The number of successes in the n independent trials follows a Binomial distribution.
Illustration: Example realization of a binomial experiment with 2 successes and 4 failures in 6 trials.

Trial Outcomes:
- Tossing a fair coin 10 times; recording heads: X = number of heads.
- Tossing a biased coin with a probability of heads at 0.7 for 10 tosses; X = number of heads.
- Choosing 13 cards from a deck; X = number of spades drawn.
- Considering number of girls among the first 100 babies born at UM hospital this year: could include identical twins.

Let X denote the number of successes in n trials, with success probability p:
- $X ext{ follows } ext{Binomial}(n,p)$ .

The probability of obtaining k successes in n trials:
- $P(X = k) = {n \brace k} p^k (1-p)^{n-k}$
- Here,
- ${n \brace k} = rac{n!}{k!(n-k)!}$ (binomial coefficient, number of ways to choose k successes and n-k failures).

The notation ${n \brace k}$ signifies "n choose k," indicating the number of ways to achieve k successes out of n trials, with no regard for order (combinations).
Explanations of factorials:
- $1! = 1$
- $2! = 2 imes 1 = 2$
- $3! = 3 imes 2 imes 1 = 6$
- By convention, $0! = 1$ .

One Success in Five Trials:
- Success can occur in any of the 5 trials, thus yielding 5 arrangements: (10000, 01000, 00100, 00010, 00001).
- Calculation:
- ${5 \brace 1} = rac{5!}{1!(5-1)!} = rac{5!}{1! imes 4!} = rac{120}{1 imes 24} = 5$ .
Two Successes in Five Trials:
- Arrangements must include 4 failures.
- Calculation yields 10 distinct ways:
- ${5 \brace 2} = rac{5!}{2!(5-2)!} = 10$ .

For a fair coin (p = 0.5) to find the probability of 2 successes (heads) and 3 failures (tails):
- Calculated as: $10 imes p^2 imes (1-p)^3 = 10 imes 0.5^2 imes 0.5^3 = 0.3125$ (31.25%).
For a biased coin (p = 0.7) to find the same:
- $10 imes p^2 imes (1-p)^3 = 10 imes 0.7^2 imes 0.3^3 = 0.1323$ (13.23%).

Graphical representation outlining probabilities for various values of N and differing p: [
P(X) ext{ plotted against Number of Successes}
]

Mean ( $\boldsymbol{\boldsymbol{ ext{μ}}}$ ):
- Expected number of successes: $ext{μ} = np$ .
Variance ( $\boldsymbol{\boldsymbol{ ext{σ}}^2}$ ):
- Variability in number of successes measured by: $ext{σ}^2 = np(1-p)$ .
Standard Deviation ( $\boldsymbol{\boldsymbol{ ext{σ}}}$ ):
- Square root of variance: $ext{σ} = ext{sqrt}(np(1-p))$ .

Basic Structure: pbinom(q, size, p, lower.tail=TRUE) computes probabilities for binomial distributions where:
- k: number of successes.
- n: number of trials.
- p: probability of success.
Example: Let X ~ Binom(10,0.1).
- To compute $P(X ≤ 2)$ : > pbinom(q=2, size=10, p=0.1) yields $0.9298092$ .

For specific probabilities, such as $P(X = 2)$ :
- Uses: $P(X = 2) = P(X ≤ 2) - P(X ≤ 1)$
- Implementation in R: pbinom(2, 10, 0.1) – pbinom(1, 10, 0.1).

Disease characterization: Cystic Fibrosis is an autosomal recessive condition caused by mutations in the CFTR gene on chromosome 7 (individual with two mutations has CF; one mutation = carrier).
Family Analysis: Parents (carriers) x 4 children scenario; expected number with CF, probability assessments for exact successes.
Necessary calculations driven by Mendel’s Laws:
- For $n = 4$ and occurrence probability $p = rac{1}{4}$ ,
- Mean success estimation $ext{μ} = np = 1$ and probability of exactly one child with CF computed using binomial formula.

Assessed risk and benchmark for infection under COVID vaccination conditions.
Calculated probabilities illustrate responses to infection rates exceeding expectations (from expected number) and implications thereof.

When conditions: $np ext{ and } n(1-p) ext{ are both ≥ 10}$ ,
- The Binomial distribution approximates the Normal distribution closely (mean = $np$ ; variance = $np(1-p)$ ).
Central Limit Theorem: This approximation aligns with concepts discussed regarding distribution convergence.

Importance of binary variables in public health research defined and framed.
Binomial distribution encapsulates success probabilities from Bernoulli trials.
Central parameters: n (number of trials) and p (probability of success).
The mean (np) and variance (np(1-p)) guide expectations in outcomes.
With sufficient sample sizes and outcomes, the binomial distribution approaches Normality, reinforcing methodologies in statistical inference processes incorporated in future courses (logistic regression, etc.).