AP Statistics Unit 4 Notes: Random Variables, Expected Value, and Combining Outcomes
Discrete Random Variables and Probability Distributions
What a random variable is (and why we bother)
A random variable is a rule that assigns a numerical value to each outcome of a chance process. In AP Statistics, you usually name a random variable with a capital letter like X, and its possible numerical values with lowercase like x.
This matters because probability questions often start with messy outcomes (sequences of coin flips, categories of survey responses, etc.). By translating outcomes into numbers, you can summarize uncertainty with tools like a probability distribution, and later compute center and spread (mean and standard deviation) the same way you do with data—except now you’re describing a theoretical long-run pattern, not a sample.
A discrete random variable is one that takes on a countable set of values (often integers): 0, 1, 2, 3, … For example:
- X = number of heads in 5 coin flips (possible values 0 through 5)
- Y = number of customers arriving in the next hour (0, 1, 2, …)
A common misconception is to think the random variable is the outcome (like “HHTHT”). It’s not—the random variable is the number you compute from the outcome (like “3 heads”).
Probability distribution for a discrete random variable
A probability distribution of a discrete random variable lists every possible value of the variable and the probability that it occurs.
If X takes values x_1, x_2, \dots, x_k, then its distribution gives P(X = x_i) for each value.
A valid discrete probability distribution must satisfy:
- Every probability is between 0 and 1:
0 \le P(X = x_i) \le 1 - The probabilities add to 1:
\sum_{i=1}^k P(X = x_i) = 1
You’ll most often see distributions shown as a table or as a probability histogram (bars at each value with heights equal to the probabilities). A probability histogram can look like a data histogram, but conceptually it is different: it represents a model (long-run relative frequencies), not a dataset.
Notation you’ll see (quick reference)
| Idea | Common notation | Meaning |
|---|---|---|
| Random variable | X | The process-defined numerical outcome |
| A specific value | x | One possible number the variable can take |
| Probability | P(X = x) | Chance that X equals that value |
| Mean / expected value | \mu_X or E(X) | Long-run average of X |
| Standard deviation | \sigma_X | Long-run typical distance of X from its mean |
| Variance | \sigma_X^2 or Var(X) | Square of the standard deviation |
Building a distribution from a chance process
To create a probability distribution, you typically:
- Define the random variable clearly in context (what is being counted or measured?).
- List all possible values it can take.
- Find the probability for each value (often using counting, binomial probability, geometric probability, or a provided model).
- Check that probabilities sum to 1.
A common error is to skip step 1 (definition) and then misinterpret what a value like X = 2 means. On AP questions, you should be able to say something like: “Let X be the number of defective bulbs in a sample of 10.” Then X = 2 has a clear meaning.
Example 1: Creating a probability distribution (3 coin flips)
Suppose you flip a fair coin 3 times. Let X be the number of heads.
Step 1: Possible values
X \in \{0, 1, 2, 3\}
Step 2: Probabilities
There are 2^3 = 8 equally likely outcomes. Count outcomes with each number of heads:
- X = 0: 1 outcome (TTT) so P(X=0)=1/8
- X = 1: 3 outcomes so P(X=1)=3/8
- X = 2: 3 outcomes so P(X=2)=3/8
- X = 3: 1 outcome so P(X=3)=1/8
Distribution table
| x | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| P(X=x) | 1/8 | 3/8 | 3/8 | 1/8 |
Check: probabilities add to 1/8 + 3/8 + 3/8 + 1/8 = 8/8 = 1.
Cumulative probability (sometimes asked)
Sometimes you’re asked for probabilities like P(X \le 2) rather than P(X = 2). For discrete variables, you usually compute these by adding the relevant probabilities.
The cumulative distribution function (CDF) is defined by:
F(x) = P(X \le x)
For the coin example:
P(X \le 2) = P(X=0)+P(X=1)+P(X=2) = 1/8 + 3/8 + 3/8 = 7/8
A common mistake is to treat P(X \le 2) as if it were P(X=2). The symbol matters.
Exam Focus
- Typical question patterns:
- “Define a random variable for this situation and give its probability distribution.”
- “Is this table a valid probability distribution? Justify.”
- “Find P(X \le a) or P(a \le X \le b) from a distribution.”
- Common mistakes:
- Forgetting to check that probabilities sum to 1 (or failing to notice a missing probability).
- Mixing up outcomes with random variable values (writing probabilities for sequences instead of counts).
- Misreading symbols like P(X < 2) vs P(X \le 2) when the variable is discrete.
Mean and Standard Deviation of Random Variables
The big idea: long-run center and long-run variability
When you compute the mean of a dataset, you’re summarizing the average of observed values. For a random variable, you’re summarizing what would happen in the long run if you repeated the chance process many times.
The mean of a random variable is also called its expected value. The word “expected” does not mean “guaranteed” or even “most likely.” It means the long-run average value.
For example, if a game has expected winnings of 0.50 dollars, you should not expect to win exactly 0.50 each time. You might win 5 sometimes and lose 1 other times—but over many plays, the average tends toward 0.50.
Expected value (mean) for a discrete random variable
If X takes values x_1, x_2, \dots, x_k with probabilities p_1, p_2, \dots, p_k, then the mean (expected value) is the probability-weighted average:
\mu_X = E(X) = \sum_{i=1}^k x_i p_i
Interpretation: \mu_X is what you’d get if you averaged an enormous number of repetitions of the random process.
A very common student error is to average the possible values without weighting by probability. If some outcomes are more likely than others, they must count more.
Variance and standard deviation: measuring spread of a random variable
The standard deviation of a random variable describes the typical distance between X and its mean \mu_X in the long run.
For a discrete random variable, the variance is:
\sigma_X^2 = \sum_{i=1}^k (x_i - \mu_X)^2 p_i
and the standard deviation is:
\sigma_X = \sqrt{\sigma_X^2}
This mirrors what you do with data: deviations from the mean, squared, averaged (with probabilities as weights), then square-rooted.
There is also a computational shortcut that can be helpful:
\sigma_X^2 = E(X^2) - (E(X))^2
where
E(X^2) = \sum_{i=1}^k x_i^2 p_i
This shortcut is useful when the distribution values are messy, but you must be careful with parentheses: square the expectation, not the other way around.
Example 2: Mean and standard deviation from a distribution
A company offers a coupon that results in the following discount (in dollars). Let X be the discount a randomly selected customer receives.
| x | 0 | 5 | 10 |
|---|---|---|---|
| P(X=x) | 0.50 | 0.40 | 0.10 |
Step 1: Compute the mean
E(X) = 0(0.50) + 5(0.40) + 10(0.10)
E(X) = 0 + 2 + 1 = 3
So \mu_X = 3 dollars. In the long run, the company gives an average discount of 3 dollars per customer.
Step 2: Compute the variance and standard deviation
Use the definition:
\sigma_X^2 = (0-3)^2(0.50) + (5-3)^2(0.40) + (10-3)^2(0.10)
\sigma_X^2 = 9(0.50) + 4(0.40) + 49(0.10)
\sigma_X^2 = 4.5 + 1.6 + 4.9 = 11.0
\sigma_X = \sqrt{11}
If you approximate, \sqrt{11} is about 3.32 dollars, meaning the discount typically differs from the mean by a bit over 3 dollars.
Example 3: Expected value as “fair price” (a classic AP framing)
A carnival game costs c dollars to play. You spin a wheel:
- With probability 0.2 you win 10 dollars.
- With probability 0.8 you win 0 dollars.
Let W be your winnings in dollars (not profit).
Expected winnings
E(W) = 10(0.2) + 0(0.8) = 2
On average, you win 2 dollars per play.
If you care about profit, define P = W - c. Then:
E(P) = E(W - c) = E(W) - c = 2 - c
A “fair” price (expected profit 0) would solve 2 - c = 0, so c = 2. If the game costs more than 2 dollars, you should expect to lose money in the long run.
Common misconception: students sometimes think a fair game means you win about half the time. Not necessarily—fairness is about expected value, not win rate.
Linear transformations and how they affect mean and standard deviation
In many problems, you define a new variable by transforming an old one, such as converting units, adding a fixed fee, or scaling a reward.
If
Y = a + bX
then:
\mu_Y = a + b\mu_X
\sigma_Y = |b|\sigma_X
Why this makes sense:
- Adding a shifts every outcome by the same amount, so the center shifts by a but the spread does not change.
- Multiplying by b stretches or shrinks distances from the mean by a factor of |b|, so the standard deviation scales by |b|.
A frequent mistake is to add a to the standard deviation. You do not—standard deviation measures spread, and adding a constant doesn’t change how spread out values are.
Exam Focus
- Typical question patterns:
- “Given this distribution, find \mu_X and \sigma_X and interpret them in context.”
- “A new variable is defined by Y = a + bX. Find \mu_Y and \sigma_Y.”
- “Find the expected value of winnings/profit; determine whether a game is fair.”
- Common mistakes:
- Computing E(X) by averaging the x values without using probabilities as weights.
- Forgetting to square-root at the end when asked for standard deviation (reporting variance instead).
- Incorrect transformation rules, especially adding a constant to standard deviation or forgetting the absolute value on b.
Combining Random Variables
Why combining random variables is such a powerful move
Many real situations are built from parts:
- Total cost = item cost + shipping
- Total points = points from multiple questions
- Total wait time = wait time for bus + travel time
Each piece can be modeled as a random variable. Combining them lets you predict the long-run behavior of a total (mean) and how much that total varies (standard deviation).
The key skill is knowing which results always hold and which require independence.
Adding and subtracting random variables: expected value
If X and Y are random variables, then expected values add exactly the way you wish they would:
E(X + Y) = E(X) + E(Y)
E(X - Y) = E(X) - E(Y)
This does not require independence.
Interpretation: in the long run, the average total is the sum of the long-run averages.
Adding and subtracting random variables: variability (standard deviation)
Spread is trickier. If X and Y are independent, then their variances add:
Var(X + Y) = Var(X) + Var(Y)
and since Var(X) = \sigma_X^2, you can write:
\sigma_{X+Y} = \sqrt{\sigma_X^2 + \sigma_Y^2}
Similarly, for independent variables:
Var(X - Y) = Var(X) + Var(Y)
So:
\sigma_{X-Y} = \sqrt{\sigma_X^2 + \sigma_Y^2}
Two big “what can go wrong” warnings:
- Standard deviations do not add. Even when independent, you add variances, not standard deviations.
- If X and Y are not independent, you generally cannot use these variance rules without additional information.
On the AP exam, if you’re supposed to add variances, the problem will typically indicate independence (or give a context that strongly implies it, like results of separate trials).
Linear combinations (the most general AP-level rule set)
A linear combination looks like:
T = a + bX + cY
For means, constants and coefficients behave normally:
\mu_T = a + b\mu_X + c\mu_Y
For standard deviations, you typically handle one coefficient at a time using variance. If X and Y are independent:
Var(T) = b^2 Var(X) + c^2 Var(Y)
So:
\sigma_T = \sqrt{b^2 \sigma_X^2 + c^2 \sigma_Y^2}
Again, independence is what lets you avoid extra covariance terms.
Example 4: Total cost with a fixed fee (combining and transforming)
Let X be the amount (in dollars) a customer spends on items. Suppose \mu_X = 45 and \sigma_X = 12. Shipping is a flat 7 dollars. Let T be the total cost.
Model:
T = X + 7
Mean:
\mu_T = \mu_X + 7 = 45 + 7 = 52
Standard deviation (adding a constant does not change spread):
\sigma_T = \sigma_X = 12
Interpretation: average total cost is 52 dollars, and totals typically vary by about 12 dollars from that average.
Example 5: Sum of independent scores
A student’s total score S is the sum of two independent section scores: A and B.
- \mu_A = 30, \sigma_A = 4
- \mu_B = 50, \sigma_B = 6
Let
S = A + B
Mean:
\mu_S = \mu_A + \mu_B = 30 + 50 = 80
Standard deviation (independent, so add variances):
\sigma_S = \sqrt{\sigma_A^2 + \sigma_B^2} = \sqrt{4^2 + 6^2} = \sqrt{16 + 36} = \sqrt{52}
So \sigma_S is about 7.21.
A very common mistake is to compute \sigma_S = 4 + 6 = 10. That would overstate the spread because it treats typical deviations as if they always point in the same direction at the same time.
Example 6: Difference of independent variables (net gain)
A store’s daily net gain N (in dollars) is revenue minus cost:
N = R - C
Suppose across days:
- \mu_R = 1200, \sigma_R = 250
- \mu_C = 800, \sigma_C = 180
- Assume R and C are independent.
Mean:
\mu_N = \mu_R - \mu_C = 1200 - 800 = 400
Standard deviation:
\sigma_N = \sqrt{\sigma_R^2 + \sigma_C^2} = \sqrt{250^2 + 180^2}
\sigma_N = \sqrt{62500 + 32400} = \sqrt{94900}
So \sigma_N is about 308.1.
Notice that the subtraction affects the mean (center) but not the way variances combine when independent.
Independence: what it means here (and how to spot it)
Two random variables X and Y are independent if knowing the value of one gives no information about the other. In many AP settings, independence comes from:
- Separate trials (coin flips, spins, draws with replacement)
- Measurements on different individuals chosen independently
But be cautious: variables computed from the same trial are often dependent. For example, in a single hand of cards, “number of hearts” and “number of red cards” are related—knowing one changes what you expect about the other.
If a problem does not indicate independence and you’re asked for the standard deviation of a sum/difference, that’s a clue you may need more information (or the problem is structured so independence is reasonable from context).
Exam Focus
- Typical question patterns:
- “Let T = X + Y (or X - Y). Find \mu_T and \sigma_T given means and standard deviations; assume independence.”
- “A quantity is transformed (fee, tax, unit conversion). Find the new mean and standard deviation.”
- “Compare variability of a total to variability of components; interpret what changes center vs spread.”
- Common mistakes:
- Adding standard deviations directly instead of adding variances.
- Using the variance-addition rule without independence (or without it being justified).
- Treating subtraction as if it subtracts variability (it doesn’t, under independence variance still adds).