Comprehensive Notes on Discrete Random Variables and Probability Distributions

Overview of Random Variables and Probability Distributions

The study of probability is divided into discrete and continuous distributions, primarily covered in chapters 5 and 6 respectively.
Chapter 5: Discrete Probability Distributions introduces the concept of random variables, the classification into discrete and continuous, and specific discrete distributions including binomial, hypergeometric, and Poisson.
Chapter 6: The Normal Probability Distribution introduces probability distributions for continuous random variables, focusing on the exponential and normal distributions. The normal distribution is cited as the most important probability distribution used in statistics.

Concept and Definition of a Random Variable

Results of random experiments are represented by outcomes in a sample space $S$ . These outcomes are not necessarily numbers (they may be letters or words).
It is convenient to represent these outcomes as numbers by associating each outcome from the sample space with a numerical value according to a clearly defined rule.
Formal Definition: When each outcome from the sample space is associated with a unique number according to a clearly defined rule, this rule is considered a function defined on the sample space. This function is called a random variable.
Variables in Context:
- In algebra, variables are denoted by lowercase letters ( $a, b, c, x, y, z$ ).
- Random variables are denoted by uppercase letters (most often $X, Y, Z$ ).
- Particular values assumed by random variables are denoted by lowercase letters ( $x, y, z$ ).
- The statement that a random variable $X$ takes a value $x$ is written as $X = x$ .
The Range: The set of all possible numbers that a random variable can take is called the range of the random variable, denoted by $R$ . If specifying for $X$ , it is $R_X$ .

Examples of Mapping Outcomes to Random Variables

Example 1: Tossing a Coin Three Times ( $n=3$ )
- Sample Space $S = \{ttt, htt, tht, tth, hht, hth, thh, hhh\}$ .
- Variable $X$ (Heads on the first toss):
  - For outcomes $ttt, tht, tth, thh$ , $X = 0$ (first toss is tails).
  - For outcomes $htt, hht, hth, hhh$ , $X = 1$ (first toss is heads).
  - Range of $X$ : $R_X = \{0, 1\}$ .
- Variable $Y$ (Number of heads in three tosses):
  - $ttt \rightarrow 0$
  - $htt, tht, tth \rightarrow 1$
  - $hht, hth, thh \rightarrow 2$
  - $hhh \rightarrow 3$
  - Range of $Y$ : $R_Y = \{0, 1, 2, 3\}$ .
Example 2: Tossing a Pair of Dice
- Sample Space $S$ consists of 36 pairs of numbers.
- Variable $Z$ (Sum of numbers):
  - Outcome $(1, 1) \rightarrow 2$
  - Outcomes $(2, 1), (1, 2) \rightarrow 3$
  - …up to $(6, 6) \rightarrow 12$
  - Range of $Z$ : $R_Z = \{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\}$ .

Classification of Random Variables

Discrete Random Variable: A random variable that assumes a finite number of possible values or a countably infinite number of values (it remains countable even if the sequence continues indefinitely, such as tossing a coin until the first head appears).
- Examples: Number of customers at a bank in an hour, number of defective light bulbs in a box of 100.
Continuous Random Variable: A random variable whose values are real numbers and which assumes an infinite number of values (a continuum) within any interval.
- There are no gaps between the values.
- Intervals can be finite (e.g., $[0, 1]$ ), semi-infinite, or infinite.
- Examples: Height or weight of a person, length of an object, time required to perform a task. Values are typically obtained via measurement and can theoretically be represented with infinite decimal accuracy.

Discrete Probability Distributions

Definition: A discrete probability distribution consists of the list of all possible values of a discrete random variable together with their corresponding probabilities.
Probability Mass Function (PMF): The function $f(x)$ or $f_X(x)$ represents the probability that the random variable $X$ is equal to the value $x$ .
- Notation: $f(x) = P(X = x)$ .
Properties of PMF:
1. Every probability value must be between 0 and 1 inclusive: $0 \le f(x_i) \le 1$ .
2. The sum of all possible values of the PMF must equal 1: $\sum_{i=1}^{n} f(x_i) = 1$ .
Interval Statements for Discrete Variables:
- Inequalities can describe intervals, e.g., $P(X > 2)$ , $P(X \le 2)$ , or double inequalities like $P(5 \le X \le 10)$ .

Representation of Discrete Probability Distributions

Tabular Form: Outcomes listed in one column/row (usually in ascending order) with their corresponding probabilities $f(x)$ in the next.
Graphical Form: Values represented on the horizontal $X$ -axis. Corresponding probabilities are shown as vertical segments or bars of unit width with height equal to $f(x_i)$ .
Formula Form: A mathematical rule used to find $f(x)$ for each value in the range.

Numerical Descriptive Measures: Mean and Variance

Mean (Expected Value): A measure of the center of the distribution.
- Notation: $\mu_X$ or $E(X)$ .
- Formula: $\mu = E(X) = \sum_{\text{all } x} x f(x)$ .
Variance: A measure of variability or spread.
- Notation: $\sigma_X^2$ , $V(X)$ , or $\text{Var}(X)$ .
- Definition Formula: $\sigma^2 = V(X) = \sum_{\text{all } x} (x - \mu)^2 f(x)$ .
- Calculation Formula: $\sigma^2 = \sum_{\text{all } x} x^2 f(x) - \mu^2$ .
Standard Deviation: The positive square root of the variance.
- Notation: $\sigma_X = \sqrt{V(X)}$ .

Case Study: Expected Gain and Insurance

Lottery Gain Example: 8000 tickets at $\$10$ each. Prize is a $\$24,000$ car. If you buy two tickets:
- Value 1 (Loss): $-\$20 + 0 = -\$20$
- Value 2 (Win): $-\$20 + \$24,000 = \$23,980$
- $P(X = 23,980) = \frac{2}{8000}$ ; $P(X = -20) = \frac{7998}{8000}$ .
- $E(X) = (-20) \cdot \frac{7998}{8000} + (23,980) \cdot \frac{2}{8000} = -\$14$ (expected loss of $\$14$ ).
Insurance Policy Pricing:
- Policy: $\$100,000$ . Cancellation probability: $0.02$ .
- If expected gain for company is $\$0$ :
  - $E(X) = C(0.98) + (C - 100,000)(0.02) = 0 \rightarrow C = \$2000$ .
- If expected gain for company is $\$500$ :
  - $E(X) = C - 2000 = 500 \rightarrow C = \$2500$ .

Discrete Uniform Distribution

Occurs when each of $n$ possible values has the same probability of occurring.
Probability Mass Function: $f(x_i) = \frac{1}{n}$ for $i = 1, 2, ..., n$ .
Example: Rolling a six-sided balanced die. Range is $\text{\{1, 2, 3, 4, 5, 6\}}$ , $n = 6$ , and $f(x) = \frac{1}{6}$ .
Descriptive Measures for Die Roll:
- $\mu = (1+2+3+4+5+6) \cdot \frac{1}{6} = 3.5$ .
- $\sigma^2 = \sum x^2 f(x) - \mu^2 = 1.707$ .

Bernoulli and Binomial Distributions

Bernoulli Experiment: A random experiment with a single trial resulting in two outcomes: Success ( $S$ ) and Failure ( $F$ ).
- $P(S) = p$ , $P(F) = 1 - p = q$ .
- Bernoulli Random Variable $X$ : $X = 1$ for success, $X = 0$ for failure.
- $\mu = p$ ; $\sigma^2 = p(1-p)$ .
Binomial Experiment: A sequence of $n$ independent Bernoulli trials.
- Trials are identical, outcomes are binary (S/F), and $p$ remains constant across trials.
- Binomial Random Variable $X$ : The number of successes in $n$ independent trials.
- Ranges from $0$ to $n$ .
- Probability Mass Function: $P(X = x) = nCx \cdot p^x (1 - p)^{n-x}$ .
- Binomial Coefficient: $nCx = \frac{n!}{x!(n - x)!}$ .
Symmetry of Binomial Graph:
- If $p = 0.5$ , the distribution is symmetric.
- If $p < 0.5$ , it is skewed to the right.
- If $p > 0.5$ , it is skewed to the left.
Descriptive Measures:
- Mean: $\mu = np$ .
- Variance: $\sigma^2 = np(1 - p)$ .

Hypergeometric Probability Distribution

Used when sampling without replacement from a finite population of size $N$ where the conditions for binomial distribution (independence and constant $p$ ) are not met.
The population contains $M$ successes and $N - M$ failures.
Random variable $X$ : Number of successes in a sample of size $n$ .
Probability Mass Function: $P(X = x) = \frac{C_x^{M} \cdot C_{n - x}^{N - M}}{C_n^N}$ .
Mean: $\mu = n \cdot \frac{M}{N}$ .
Variance: $\sigma^2 = n \cdot \frac{M}{N} \cdot (1 - \frac{M}{N}) \cdot \frac{N - n}{N - 1}$ .
- The term $\frac{N - n}{N - 1}$ is called the correction factor.
Comparison to Binomial:
- If $n / N \le 0.05$ and $n / (N - M) \le 0.05$ , the binomial can approximate sampling without replacement. If these are not satisfied, Hypergeometric must be used.

Poisson Probability Distribution

Used for experiments observing the number of occurrences of a specified event over a time interval or within a region of space.
Assumptions:
1. Events occur randomly.
2. Events occur independently.
3. Events occur at a steady average rate ( $\lambda$ or $\mu$ ).
The number of events $X$ is a random variable that follows the Poisson distribution.
Probability Mass Function: $P(X = x) = \frac{e^{-\lambda} \lambda^x}{x!}$ .
- $e \approx 2.71828$ .
Parameters:
- Mean: $\mu = \lambda$ .
- Variance: $\sigma^2 = \lambda$ .
Relationship to Interval Size: The average rate $\lambda$ is proportional to the length of the time interval or size of the space region. For example, if rate is $15 \text{/hour}$ , the rate for $15 \text{ minutes}$ is $3.75$ .

Poisson Approximation to the Binomial

The Poisson distribution provides a good approximation to the binomial distribution under specific conditions:
1. The number of trials $n$ is large ( $n \ge 100$ ).
2. The probability of success $p$ is small ( $p \le 0.01$ ).
3. The mean $np < 5$ .
In this case, $\lambda = np$ .
Example: Bad serum reaction with $p = 0.001$ and $n = 2000$ . Here $\lambda = 2000 \times 0.001 = 2$ . Probability of exactly 3 reactions: $P(X=3) = \frac{e^{-2} \cdot 2^3}{3!} \approx 0.1804$ .