Week 6 - Comprehensive Study Notes on Discrete Random Variables

The Role and Impact of Random Variables

  • Conceptual Foundations: Why Random Variables Matter
    • Quantifying Uncertainty: Random variables serve to assign numerical values to random outcomes, which transforms abstract uncertainty into a measurable and manageable format.
    • Computing Probabilities: By utilizing probability distributions, it becomes possible to determine the likelihood of specific events, enabling precise calculations rather than estimations.
    • Predictions and Decision-Making: These tools allow for the forecasting of outcomes and the rigorous assessment of risk, which directly supports and enhances high-level decision-making processes.
    • Modeling Real-World Phenomena: They provide a mathematical framework to represent the complexities of reality through structured distributions, forming the bedrock of statistical inference and modern data science.
    • Foundations for Expectation: They define the "expected value," which is crucial for understanding the long-term behavior of a system or process.
    • Measuring Variability and Risk: Through metrics like variance and standard deviation, random variables quantify the volatility and spread of potential outcomes.

Learning Objectives for Discrete Probability Distributions

  • Relational Understanding: Students must understand the relationship between specific outcomes and their associated probabilities within a distribution.
  • Descriptive Methods: Proficiency is required in describing discrete probability distributions using three distinct formats:
    • Tables: Listing individual outcomes alongside their probabilities.
    • Graphs: Visual representations of probability mass functions.
    • Formulas: Mathematical expressions (PMFs) that define the distribution.
  • Analytical Computation: Students must know how to compute and interpret the Expected Value and Standard Deviation of these distributions.

Fundamental Definitions and Classifications

  • Random Variable (XX): A numerical description of the outcome resulting from an experiment.
  • Discrete Random Variable:
    • Defined by its ability to assume either a finite number of values or an infinite sequence of countable values.
    • Examples:
      • Hotel Reservations: Making 100100 reservations; X=number of guests who show upX = \text{number of guests who show up}. Possible outcomes: 0,1,2,...ˉ,98,99,1000, 1, 2, \bar{\text{...}}, 98, 99, 100.
      • Restaurant Operations: Running a restaurant for one day; X=number of customersX = \text{number of customers}. Possible outcomes: 0,1,2,3,...ˉ0, 1, 2, 3, \bar{\text{...}}.
      • Room Sales: Choosing a hotel room; X=1X = 1 if it has a sea view, X=0X = 0 if it does not.
  • Continuous Random Variable:
    • Defined by its ability to assume any numerical value within an interval or a collection of intervals.
    • Example:
      • Hotel Reception: X=time between two consecutive customer arrivals, in minutesX = \text{time between two consecutive customer arrivals, in minutes}. Possible outcomes: x β 0x \text{ } \boldsymbol{\beta} \text{ } 0.

The Discrete Probability Distribution

  • Definition: This distribution describes how probabilities are assigned across the possible values of a discrete random variable.
  • Representations:
    • Probability Mass Function (pmf): Denoted as f(x)f(x), it provides the probability for each specific value of the random variable.
    • Alternative Notation: P(X=x)P(X = x), which explicitly states the probability that the variable XX takes the value xx.
  • Mathematical Requirements: For a discrete random variable, the following conditions must always be satisfied:
    1. Individual probabilities must be non-negative: f(x)β0f(x) \boldsymbol{\beta} 0 and P(X=x)β0P(X = x) \boldsymbol{\beta} 0.
    2. The sum of all probabilities in the sample space must equal exactly one: SUM(f(x))=1\text{SUM}(f(x)) = 1 or SUM(P(X=x))=1\text{SUM}(P(X = x)) = 1.

Case Study: Uniform Distribution (Rolling a Single Die)

  • Experiment: Rolling one standard six-sided die.
  • Random Variable (XX): The number of dots on the top face.
  • Possible Values (xx): 1,2,3,4,5,61, 2, 3, 4, 5, 6.
  • Probability Representation:
    • Table: Every outcome (11 through 66) has an identical probability of P(X=x)=16P(X = x) = \frac{1}{6}.
    • Graph: A bar chart where each outcome from 11 to 66 reaches the height of approximately 0.1670.167.
    • Formula: f(x)=1nf(x) = \frac{1}{n}, where nn is the number of possible outcomes. For a die, f(x)=16f(x) = \frac{1}{6}.

Case Study: Distribution of the Sum of Two Dice

  • Experiment: Rolling two standard dice simultaneously.
  • Random Variable (XX): The sum of the number of dots on the top faces of both dice.
  • Possible Values: Values range from 22 (rolling two 1s) to 1212 (rolling two 6s), specifically: 2,3,4,5,6,7,8,9,10,11,122, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Measures of Central Tendency: Expected Value

  • Theoretical Concept: The expected value is the theoretical long-term average of a random variable. It does not necessarily represent an outcome that will occur in a single trial.
  • Formula:
    • E[X]=SUM(xi×P(X=xi))E[X] = \text{SUM}(x_i \times P(X = x_i))
  • Excel Implementation: Use the function SUMPRODUCT(values; probabilities).
  • Practical Use: This is a vital tool for decision-making under uncertainty, as it only requires knowledge of probabilities and outcomes, not historical data.
  • Mathematical Properties:
    • Linearity: E[aX+b]=a×E[X]+bE[aX + b] = a \times E[X] + b
    • Additivity: E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y]

Real-World Example: Stock Decisions Based on Expected Value

  • Scenario: A stock trades at 250250. Overnight results will determine tomorrow's price.
    • Good News (60% probability): Price rises to 255255 (Profit = +5+5).
    • Bad News (40% probability): Price falls to 240240 (Profit = 10-10).
  • Expected Price Calculation:
    • 255×0.60+240×0.40=153+96=249255 \times 0.60 + 240 \times 0.40 = 153 + 96 = 249
  • Expected Profit Calculation:
    • 5×0.60+(10)×0.40=34=15 \times 0.60 + (-10) \times 0.40 = 3 - 4 = -1
  • Decision: Based on an expected profit of 1-1, you should not buy the stock.

The Gambler’s Fallacy: Roulette Example

  • Observation: The last 66 trials at a roulette table resulted in: 2,32,14,27,30,212, 32, 14, 27, 30, 21. Out of these, 22 outcomes were "Black."
  • Question: What is the most likely number of "Black" outcomes considering a total of 88 spins (the past 66 results plus 22 future spins)?
  • Analysis of Future Spins: For the next 22 spins, there are four possible combinations, each with a probability of 0.250.25 (0.5×0.50.5 \times 0.5):
    • Red, Red (00 Blacks): 2525%
    • Red, Black / Black, Red (11 Black): 5050%
    • Black, Black (22 Blacks): 2525%
  • Conclusion: The most likely outcome for the future spins is 11 "Black." Added to the 22 that already occurred, the most likely total is 33.

Measures of Variability: Variance and Standard Deviation

  • Variance Formula (Var(X)Var(X)):
    • Var(X)=SUM((xiE[X])2×P(X=xi))Var(X) = \text{SUM}((x_i - E[X])^2 \times P(X = x_i))
    • Excel Implementation: SUMPRODUCT((values - expected_value)^2; probabilities).
  • Standard Deviation Formula (̓(X)):
    • ̓(X) = \text{SQRT}(Var(X))
  • Use and Interpretation:
    • Measures the dispersion of outcomes around the expected value.
    • Standard Deviation is expressed in the same units as the random variable (XX), whereas Variance is in squared units.
  • Additivity Property:
    • Var(X+Y)=Var(X)+Var(Y)Var(X + Y) = Var(X) + Var(Y)

Summary of Common Discrete Probability Distributions

DistributionProbability FunctionConditions of UseHospitality ExampleMeanVariance
Discrete UniformP(X=x)=1nP(X = x) = \frac{1}{n}Finite set of outcomes where all outcomes are equally likely and independent.Hotel gives one free upgrade randomly to one of 55 guests.E[X]=x1+xn2E[X] = \frac{x_1 + x_n}{2}Var(X)=n2112Var(X) = \frac{n^2 - 1}{12}
BernoulliP(X=1)=pP(X = 1) = p, P(X=0)=1pP(X = 0) = 1 - pSingle trial with only two outcomes (Success/Failure) and constant probability pp.A guest shows up (11) or does not show up (00).E[X]=pE[X] = pVar(X)=p×(1p)Var(X) = p \times (1 - p)
BinomialP(X=x)=(nx)px(1p)nxP(X = x) = \binom{n}{x} p^x (1-p)^{n-x}Fixed number of independent trials (nn) with two outcomes and constant probability pp.Number of guest shows among 2020 guest bookings.E[X]=n×pE[X] = n \times pVar(X)=n×p×(1p)Var(X) = n \times p \times (1 - p)
PoissonP(X = x) = \frac{e^{- ̓} ̓^x}{x!}Counts events in a fixed interval of time or space; events are independent and occur at a constant rate ̓.Number of complaints received per day.E[X] = ̓Var(X) = ̓