Key Discrete Distributions in Statistics and Data Science

Statistical Inference for Estimation in Data Science Notes

Discrete Distributions Overview

  • Discrete distributions are fundamental in statistics and data science. Examples include:
    • Bernoulli Distribution
    • Geometric Distribution
    • Binomial Distribution
    • Poisson Distribution

Bernoulli Distribution

  • Definition: A random variable representing two possible outcomes: Success (1) and Failure (0).
  • Parameters:
    • Probability of success: p (where 0 ≤ p ≤ 1)
    • Probability of failure: 1 - p
  • Random Variable:
    [ X = \begin{cases} 1 & \text{if Success} \ 0 & \text{if Failure} \end{cases} ]
  • Probability Mass Function (pmf):
    [ f(x) = \begin{cases} p & \text{if } x = 1 \ 1 - p & \text{if } x = 0 \ 0 & \text{otherwise} \end{cases} ]
  • Notation:
    [ X \sim \text{Bernoulli}(p) ]

Geometric Distribution

  • Definition: Models the number of trials until the first success in a series of independent Bernoulli trials.
  • Parameters: Same p probability of success.
  • Random Variable:
    [ X = \text{number of trials until first success} ]
  • Probability that X equals k:
    [ P(X = k) = p(1 - p)^{k-1} \text{ for } k = 1, 2, 3, \ldots ]
  • pmf:
    [ f(x) = \begin{cases} (1 - p)^{x - 1} p & x = 1, 2, 3, \ldots \ 0 & \text{otherwise} \end{cases} ]
  • Alternatively defined as the number of failures before the first success changes the pmf slightly:
    [ f(x) = (1 - p)^{x} p \; ext{ for } x = 0, 1, 2, \ldots ]

Binomial Distribution

  • Definition: Counts the number of successes in n independent Bernoulli trials.
  • Parameters:
    • n: Number of trials
    • p: Probability of success in each trial.
  • Random Variable:
    [ X = \text{number of successes in } n \text{ trials} ]
  • pmf: [ f(x) = \binom{n}{x} p^{x} (1 - p)^{n - x} I_{{0,1,…,n}}(x) ]
    • \binom{n}{x}: Count of how many ways to achieve x successes out of n attempts.

Poisson Distribution

  • Definition: Models the number of occurrences of an event in a fixed interval when events happen with a known constant mean rate.
  • Parameters:
    • λ: Average rate (parameter of the distribution).
  • Random Variable:
    [ X \text{ takes values in } {0, 1, 2, \ldots} ]
  • pmf:
    [ f(x) = \frac{e^{-λ} λ^{x}}{x!} I_{{0,1,2,…}}(x) ]
  • Example: In a large area, if we scatter a large number of seeds randomly, the distribution of seeds in individual sections follows a Poisson distribution as n grows large and p becomes small.
  • Application: Often used to describe the distribution of rare events within a large population.