Notes: Page-by-Page Discrete Random Variables and Moments

Page 1: Expected value basics and Bernoulli/geometric setup

Expected value definition: $\mathbb{E}[X] = \sum_x x \cdot P(X=x)$
Bernoulli example (two outcomes 0 or 1): if $P(X=1)=p1$ and $P(X=0)=p0=1-p1$ , then $\mathbb{E}[X] = 0\cdot p0 + 1\cdot p1 = p1$
Geometric setup (first success on trial X):
- PMF: $P(X=x) = (1-p)^{x-1} p, \quad x=1,2,3,…$
- Intuition: number of trials until first success; larger p → smaller expected number of trials.

Expected value of geometric distribution via calculus:
- $\mathbb{E}[X] = \sum_{x=1}^{\infty} x (1-p)^{x-1} p$
- Trick: differentiate the geometric series with respect to the parameter and interchange sum and derivative (under regularity conditions: differentiability and uniform convergence).
- Result: $\mathbb{E}[X] = \frac{1}{p}$
- Quick intuition: if p=0.2, then $\mathbb{E}[X] = 5$ .
Note on interchangeability: requires differentiability of the term and a justification (uniformly bounded sum); these are standard calculus conditions.
Reminder: geometric mean/expectation is not trivial despite its simple form.

Not all distributions have a finite expected value.
- Example: Cauchy distribution (heavy tails) can yield $\mathbb{E}[X]$ that does not exist (diverges).
- If you try $\mathbb{E}[X] = \sum_x x \cdot P(X=x)$ and the sum diverges, the expectation does not exist.
Practical takeaway: to guarantee an expectation exists, it is common to require \mathbb{E}[|X|] < \infty (absolute integrability).
Relevance: heavy-tailed distributions have large tail probability mass which can make the expectation undefined or infinite.

Expectation of a function of a random variable: $\mathbb{E}[h(X)] = \sum_x h(x) \cdot P(X=x)$
Profit example (three computers):
- Cost: $3\times 500 = 1500$
- Sell price: $1000$ per unit; unsold units repurchased at $200$ each.
- If sold x units, revenue: $1000x + 200(3-x) = 800x + 600$
- Profit: $\text{Profit} = \text{Revenue} - \text{Cost} = (800x + 600) - 1500 = 800x - 900$
- Therefore: $\mathbb{E}[ ext{Profit}] = 800\mathbb{E}[X] - 900$
If the function is linear, $h(x)=a x + b$ , then $\mathbb{E}[h(X)] = a\mathbb{E}[X] + b$ (linearity of expectation).
Practical use: compute (\mathbb{E}[X]) once and apply it to the linear form; for non-linear h, compute the full sum.

Variance definition for a random variable: $\operatorname{Var}(X) = \mathbb{E}[(X - \mu)^2],\quad \mu = \mathbb{E}[X]$
Discrete distribution variance: $\operatorname{Var}(X) = \sum_x (x - \mu)^2 \cdot P(X=x)$
Relation to data variance: distribution variance uses probability weights (not dividing by n or n-1); data variance uses sample-based averaging with a denominator (n-1) for unbiased estimation.
Alternative form: $\operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$ which is often easier to compute.
Standard deviation: $\operatorname{SD}(X) = \sqrt{\operatorname{Var}(X)}$ ; interprets dispersion in the same units as X.
Interpretation: larger variance/SD implies greater dispersion or volatility in the distribution.

Summary and preview:
- We covered means (expected value) and variance for discrete random variables, including linear and non-linear transformations.
- Noted that some distributions do not have finite expectations; absolute integrability is a typical condition for existence.
- Next steps: discrete random variables chapter continuation, well-known discrete distributions, and when to use them to model real-life systems; practice computing means/variances and understanding distribution shapes.