Discrete Distributions

Discrete Distributions Study Guide


1. Variance and Standard Deviation

  • Variance (σ²): Measures how spread out the values of a numerical random variable are from the mean.

  • Standard Deviation (σ): The square root of variance, keeping the unit consistent with the data.

Formula for Variance:

Var(X) = E[(X - μ)²] = ∑ (x - μ)² P(X = x)

where μ = E(X) (expected value of X).

Example: Rolling a Fair Die

  • Mean: E(D) = 3.5

  • Variance: Var(D) = 2.92

  • Standard Deviation: σ(D) = 1.71

Effect of Distribution on Variance

  • If extreme values are more likely, variance increases.

  • If extreme values are less likely, variance decreases.


2. Properties of Expected Value and Variance

For a random variable X and constants c, Y:

  • E(X + c) = E(X) + c

  • E(cX) = cE(X)

  • Var(X + c) = Var(X)

  • Var(cX) = c² Var(X)

  • Var(X + Y) = Var(X) + Var(Y) (only if X and Y are independent)


3. Bernoulli Distribution

  • A Bernoulli trial is a single experiment with two outcomes (success/failure).

  • Probability mass function (PMF):

    P(X = 1) = p, P(X = 0) = 1 - p

  • Mean: E(X) = p

  • Variance: Var(X) = p(1 - p)

  • Standard Deviation: σ(X) = √(p(1 - p))


4. Binomial Distribution

  • Models the number of successes in n independent Bernoulli trials.

  • PMF:

    P(X = k) = (n choose k) pˆk (1 - p)ˆ(n - k)

    where (n choose k) = n! / (k!(n - k)!)

  • Mean: E(X) = np

  • Variance: Var(X) = np(1 - p)

  • Standard Deviation: σ(X) = √(np(1 - p))

Example: Flipping a Coin 20 Times (p = 2/3)

  • E(X) = 20 * (2/3) = 13.33

  • Var(X) = 20 (2/3) (1/3) = 4.44

  • σ(X) = √(4.44) = 2.11


5. Zipf Distribution (Inverse Power Law Distribution)

  • Describes data where a few values occur very frequently, and many values occur very rarely.

  • PMF:

    P(X = k) ∝ (k + d)⁻ᵇ

    where d is an offset and α is the exponent.

  • Common in natural language processing, web analysis, wealth distributions.

Example: Word Frequency in the British National Corpus

  • Rank 1 ("the"): 6.2 million occurrences

  • Rank 2 ("of"): 2.9 million occurrences

  • Rank 3 ("and"): 2.67 million occurrences

  • Follows y = c (r + 1)⁻α, where α ≈ 1.08

Implications of Zipf's Law

  • Fat Head: A few values dominate (e.g., the top 175 words account for 50% of tokens).

  • Long Tail: A large portion of occurrences come from rare values (e.g., words occurring only once make up 0.5% of all tokens).

  • Issues for AI:

    • Easy to capture common cases, but difficult to cover rare cases.

    • Requires large datasets for accurate modeling.


6. Key Takeaways

  • Variance and standard deviation measure the spread of a distribution.

  • Bernoulli distribution models single-trial success/failure.

  • Binomial distribution models multiple independent trials.

  • Zipf distribution explains power-law relationships in data.

  • AI applications face challenges due to rare event distributions.


7. Practice Questions

  1. True or False: The variance of a binomial distribution is always less than its mean.

  2. If a fair die is rolled 30 times, what is the expected number of times it lands on 6?

  3. In a language corpus, the most common word appears 5 million times. The second most common appears 2.5 million times. Estimate how many times the 10th most common word appears using Zipf's law.

🚀 Use this guide to master Discrete Distributions for problem sets and exams!

robot