Key Discrete Distributions in Statistics and Data Science
Statistical Inference for Estimation in Data Science Notes
Discrete Distributions Overview
- Discrete distributions are fundamental in statistics and data science. Examples include:
- Bernoulli Distribution
- Geometric Distribution
- Binomial Distribution
- Poisson Distribution
Bernoulli Distribution
- Definition: A random variable representing two possible outcomes: Success (1) and Failure (0).
- Parameters:
- Probability of success:
p (where 0 ≤ p ≤ 1) - Probability of failure:
1 - p
- Random Variable:
[ X = \begin{cases} 1 & \text{if Success} \ 0 & \text{if Failure} \end{cases} ] - Probability Mass Function (pmf):
[ f(x) = \begin{cases} p & \text{if } x = 1 \ 1 - p & \text{if } x = 0 \ 0 & \text{otherwise} \end{cases} ] - Notation:
[ X \sim \text{Bernoulli}(p) ]
Geometric Distribution
- Definition: Models the number of trials until the first success in a series of independent Bernoulli trials.
- Parameters: Same
p probability of success. - Random Variable:
[ X = \text{number of trials until first success} ] - Probability that X equals k:
[ P(X = k) = p(1 - p)^{k-1} \text{ for } k = 1, 2, 3, \ldots ] - pmf:
[ f(x) = \begin{cases} (1 - p)^{x - 1} p & x = 1, 2, 3, \ldots \ 0 & \text{otherwise} \end{cases} ] - Alternatively defined as the number of failures before the first success changes the pmf slightly:
[ f(x) = (1 - p)^{x} p \; ext{ for } x = 0, 1, 2, \ldots ]
Binomial Distribution
- Definition: Counts the number of successes in
n independent Bernoulli trials. - Parameters:
n: Number of trialsp: Probability of success in each trial.
- Random Variable:
[ X = \text{number of successes in } n \text{ trials} ] - pmf:
[ f(x) = \binom{n}{x} p^{x} (1 - p)^{n - x} I_{{0,1,…,n}}(x) ]
\binom{n}{x}: Count of how many ways to achieve x successes out of n attempts.
Poisson Distribution
- Definition: Models the number of occurrences of an event in a fixed interval when events happen with a known constant mean rate.
- Parameters:
λ: Average rate (parameter of the distribution).
- Random Variable:
[ X \text{ takes values in } {0, 1, 2, \ldots} ] - pmf:
[ f(x) = \frac{e^{-λ} λ^{x}}{x!} I_{{0,1,2,…}}(x) ] - Example: In a large area, if we scatter a large number of seeds randomly, the distribution of seeds in individual sections follows a Poisson distribution as
n grows large and p becomes small. - Application: Often used to describe the distribution of rare events within a large population.