Discrete Probability Models in AP Statistics: Binomial and Geometric

The Binomial Distribution

What it is (and what kind of situation it models)

A binomial distribution is a probability model for a specific kind of counting problem: you repeat the same chance process a fixed number of times and count how many times a particular outcome happens.

The classic way to say it is: you have $n$ trials, each trial results in either “success” or “failure,” and you let the random variable $X$ be the **number of successes** in those $n$ trials. If the conditions are right, then $X$ follows a binomial distribution.

In AP Statistics, the binomial model matters because it is one of the main ways you turn a real-world chance process into a probability distribution for a random variable. Once you have that distribution, you can find probabilities like “exactly 3 successes,” “at least 1 success,” or “between 5 and 8 successes,” and you can also describe the distribution using its mean and standard deviation.

Why it matters (the bigger picture)

Unit 4 is about linking probability to random variables and distributions. The binomial distribution is a “workhorse” discrete distribution because many real situations involve counting successes in repeated trials:

quality control (defective vs not defective)
medicine (side effect vs no side effect)
polling (supports candidate vs does not)
sports free throws (make vs miss)

A key AP Stats skill is recognizing when a scenario is binomial, choosing the correct parameters $n$ and $p$ , and then calculating probabilities correctly.

How it works: the Binomial Setting (conditions)

A situation fits a binomial model when it satisfies the Binomial Conditions (often remembered by the mnemonic BINS):

Binary: Each trial has two outcomes (success/failure).
Independent: Trials do not affect each other.
Number: The number of trials $n$ is fixed in advance.
Same probability: The probability of success $p$ stays constant from trial to trial.

These conditions are not just formalities. They prevent subtle errors:

If the probability changes over time (not “Same”), it is not binomial.
If you keep going “until something happens,” then $n$ is not fixed, and it’s likely geometric instead.
If trials aren’t independent (for example, sampling without replacement from a small population), the binomial model may be inappropriate unless the population is large enough relative to the sample.

Independence and the 10% condition (common AP Stats justification)

A very common AP Statistics context is drawing individuals from a population without replacement. Those trials are technically dependent. However, AP Statistics often treats them as “approximately independent” if the sample is small relative to the population.

A standard check is the 10% condition: if you sample without replacement, and the sample size is no more than 10% of the population size, then treating the trials as independent (and therefore using a binomial model) is usually reasonable.

Notation and probability model

If $X$ is the number of successes in $n$ trials with success probability $p$ , then:

$X \sim \text{Bin}(n,p)$

Here’s what the parameters mean:

$n$ = number of trials (fixed)
$p$ = probability of success on each trial
$X$ = number of successes in the $n$ trials (takes values $0,1,2,\dots,n$ )

The binomial probability formula (pmf)

The probability mass function (pmf) gives the probability of getting exactly $k$ successes:

$P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}$

Where:

$k$ is a whole number from $0$ to $n$
$\binom{n}{k}$ counts how many different ways you can place $k$ successes among $n$ trials
$p^k(1-p)^{n-k}$ is the probability of any one specific sequence with $k$ successes and $n-k$ failures

A common misconception is to focus only on $p^k(1-p)^{n-k}$ and forget the combinations factor. That would be correct only if you cared about one specific sequence (like SSSFF), not “any order.”

Mean and standard deviation of a binomial random variable

Binomial distributions come with simple formulas for center and spread:

$\mu_X = np$

$\sigma_X = \sqrt{np(1-p)}$

Interpretation:

$np$ is the long-run average number of successes you’d expect if you repeated the entire $n$ -trial process many times.
$\sqrt{np(1-p)}$ describes the typical amount $X$ varies from that mean.

Students sometimes try to use $p$ as a “mean,” but $p$ is a probability for one trial. The mean number of successes across $n$ trials is $np$ .

Binomial calculations in practice (including “at least” and “no more than”)

Many binomial questions are not “exactly $k$ .” They use phrases like:

“at least” (means $\ge$ )
“at most” or “no more than” (means $\le$ )
“between” (often inclusive unless stated otherwise)

For these, you can:

Add appropriate exact probabilities (possible when the range is small).
Use a complement when it’s simpler (very common and efficient).

For example:

$P(X \ge 1)=1-P(X=0)$
$P(X \le 2)=P(X=0)+P(X=1)+P(X=2)$

On the AP Exam, complement methods are often the difference between a clean solution and a long, error-prone one.

Worked Example 1: Defects in a shipment (checking conditions, then computing)

A factory produces bolts, and historically 4% are defective. A quality inspector randomly selects 10 bolts from a very large day’s production and checks whether each is defective.

Let $X$ be the number of defective bolts in the sample.

Step 1: Justify a binomial model.

Binary: each bolt is defective or not.
Number: $n=10$ bolts are checked.
Same probability: assume defect rate is constant at $p=0.04$ .
Independent: since the production is very large relative to 10, independence is reasonable (and sampling is effectively like with replacement).

So $X \sim \text{Bin}(10,0.04)$ .

(a) Find $P(X=2)$ .

$P(X=2)=\binom{10}{2}(0.04)^2(0.96)^8$

You would typically evaluate this with technology, but the expression itself shows correct setup: combinations times success and failure probabilities.

(b) Find $P(X\ge 1)$ (at least one defective).
Using the complement is simplest:

$P(X\ge 1)=1-P(X=0)$

$P(X=0)=\binom{10}{0}(0.04)^0(0.96)^{10}=(0.96)^{10}$

So:

$P(X\ge 1)=1-(0.96)^{10}$

(c) Find the mean and standard deviation.

$\mu_X=np=10(0.04)=0.4$

$\sigma_X=\sqrt{np(1-p)}=\sqrt{10(0.04)(0.96)}$

Interpretation: in repeated samples of 10, you’d average about 0.4 defective bolts per sample, with a typical variation given by the standard deviation.

Worked Example 2: Multiple-choice guessing (avoiding a common trap)

Suppose you guess on 12 multiple-choice questions, each with 5 choices, exactly one correct. Let $X$ be the number correct.

This is binomial because:

each question is correct/incorrect (Binary)
$n=12$ fixed (Number)
guessing makes probability constant at $p=0.2$ (Same)
answers are independent if each question is separate (Independent)

So $X \sim \text{Bin}(12,0.2)$ .

Find $P(X\ge 4)$ .

You could compute:

$P(X\ge 4)=1-P(X\le 3)$

And then find $P(X\le 3)=P(X=0)+P(X=1)+P(X=2)+P(X=3)$ using the binomial formula or a binomial CDF on a calculator.

A frequent mistake here is to treat “at least 4” as just $P(X=4)$ . AP questions often reward careful reading of these inequality phrases.

Technology notes (what AP Stats expects you to be able to do)

AP Statistics commonly allows binomial probabilities to be computed using technology. Many graphing calculators include:

a function for $P(X=k)$ (often called something like binomial pdf)
a function for $P(X\le k)$ (often called something like binomial cdf)

Even when you use technology, the exam still expects you to:

identify $n$ and $p$ correctly
define the random variable in context
choose between exact probability vs cumulative vs complement
interpret the result in context

Exam Focus

Typical question patterns:
- “Given $n$ and $p$ , define $X$ and find $P(X=k)$ or $P(X\le k)$ .”
- “Verify the binomial conditions for a scenario, then compute a probability.”
- “Find and interpret the mean and standard deviation of a binomial random variable.”
Common mistakes:
- Using a binomial model when $n$ is not fixed (that’s often geometric instead).
- Forgetting the combinations term $\binom{n}{k}$ when finding $P(X=k)$ .
- Misreading inequality language (confusing “at least” with “exactly”).

The Geometric Distribution

What it is (and how it differs from binomial)

A geometric distribution models a different kind of counting problem than binomial. Instead of fixing the number of trials and counting successes, you fix what counts as success and count how many trials it takes to get the first success.

In other words, you keep repeating the same chance process until the first success happens.

This is the key contrast:

Binomial: fixed $n$ , random number of successes.
Geometric: fixed “first success,” random number of trials needed.

This distinction is one of the most tested conceptual points. If you catch yourself thinking “we keep trying until…,” that’s a strong hint that geometric is the right model.

Why it matters

The geometric distribution is useful whenever you are waiting for a “first time” event:

number of customers until the first one buys something
number of free throws until the first make
number of phone calls until the first wrong number
number of flights until the first delay

It also helps you practice building probability models for random variables and working with complements and cumulative probabilities.

How it works: the Geometric Setting (conditions)

A situation fits a geometric model when:

You have repeated trials.
Each trial results in success or failure.
The probability of success $p$ is the same on each trial.
The trials are independent.
The random variable $X$ counts the number of trials until the first success.

Just like with binomial, the “same $p$ ” and “independent” conditions are essential. If success becomes more likely over time (say, you learn and improve), geometric is not appropriate.

Notation (and a crucial definition choice)

Typically in AP Statistics, if $X$ is the number of trials until the first success, then:

$X \sim \text{Geom}(p)$

Important clarification: Some sources define a geometric random variable as the number of failures before the first success. AP Statistics commonly uses “number of trials until first success,” which means:

possible values: $1,2,3,\dots$
$X=1$ means success on the first trial

If you ever use a formula or calculator function, make sure it matches this definition.

The geometric probability formula (pmf)

If $X \sim \text{Geom}(p)$ and $X$ counts trials until the first success, then:

$P(X=k)=(1-p)^{k-1}p$

Reasoning (this is worth understanding, not memorizing blindly):

To have $X=k$ , you need $k-1$ failures first, then a success.
Each failure has probability $1-p$ .
The success has probability $p$ .
Independence lets you multiply: $(1-p)^{k-1}p$ .

A common error is to write $(1-p)^k p$ , forgetting that if the first success happens on trial $k$ , only the first $k-1$ trials are failures.

Cumulative probabilities and the “greater than” form

You will often be asked questions like “What is the probability it takes more than 5 trials?” That’s:

$P(X>5)$

For geometric, there is a very clean way to think about this:

$X>5$ means you did not get a success in the first 5 trials.
That means 5 failures in a row.

So:

$P(X>5)=(1-p)^5$

Similarly:

$P(X>k)=(1-p)^k$

And since $P(X\le k)$ is the complement:

$P(X\le k)=1-(1-p)^k$

Students sometimes try to sum $P(X=1)+\cdots+P(X=k)$ every time. That works, but the complement form is faster and less error-prone.

Mean and standard deviation of a geometric random variable

For $X \sim \text{Geom}(p)$ (trials until first success):

$\mu_X=\frac{1}{p}$

$\sigma_X=\sqrt{\frac{1-p}{p^2}}$

Interpretation:

If the chance of success each trial is $p$ , then on average you wait $1/p$ trials for the first success.
The standard deviation describes typical variability in that waiting time.

A conceptual pitfall: $1/p$ is not a guarantee. For example, if $p=0.2$ , the mean is $5$ trials, but sometimes success occurs on trial 1, and sometimes it takes much longer.

The memoryless property (a uniquely geometric idea)

The geometric distribution has a special feature called the memoryless property: past failures do not change the future waiting-time probabilities.

In probability form:

$P(X>m+n \mid X>m)=P(X>n)$

Interpretation: if you’ve already failed for $m$ trials, the probability you’ll need more than $n$ additional trials is the same as if you were starting fresh. This only holds because each trial is independent and the success probability stays the same.

Students often feel this is “counterintuitive,” but it’s really just a restatement of independence with constant $p$ .

Worked Example 1: Free throws until the first make

A basketball player makes a free throw with probability $p=0.75$ each attempt. Assume attempts are independent. Let $X$ be the number of attempts until the first made free throw.

So $X \sim \text{Geom}(0.75)$ .

(a) Find $P(X=3)$ .
To make the first shot on the 3rd attempt, the player must miss twice, then make:

$P(X=3)=(1-0.75)^{2}(0.75)=(0.25)^2(0.75)$

(b) Find $P(X>4)$ (more than 4 attempts).
More than 4 attempts means no made free throws in the first 4 attempts:

$P(X>4)=(1-0.75)^4=(0.25)^4$

(c) Find the expected number of attempts.

$\mu_X=\frac{1}{p}=\frac{1}{0.75}=\frac{4}{3}$

Interpretation: in the long run, the first make happens after about 1.33 attempts on average.

Worked Example 2: Customer arrivals until a purchase (and using memoryless)

Suppose each customer independently makes a purchase with probability $p=0.10$ . Let $X$ be the number of customers you must see until the first purchase.

(a) Find $P(X\le 5)$ .
Use the complement form:

$P(X\le 5)=1-P(X>5)=1-(1-0.10)^5=1-(0.90)^5$

(b) You have already seen 8 customers and none purchased. What is the probability you will need more than 3 additional customers to get the first purchase?
This is:

$P(X>11 \mid X>8)$

By the memoryless property:

$P(X>11 \mid X>8)=P(X>3)$

And:

$P(X>3)=(0.90)^3$

A common mistake is to think the probability should change because “you’re overdue.” In a geometric model, “overdue” is not a thing; each trial resets because the success probability stays the same and trials are independent.

Technology notes

As with binomial, AP Statistics often expects you to use technology for geometric probabilities. Many calculators provide functions like:

geometric pdf for $P(X=k)$
geometric cdf for $P(X\le k)$

Even if you use a calculator, always set up the random variable and parameters correctly, and make sure the calculator’s definition matches AP’s (trials until first success vs failures before first success).

Connecting geometric and binomial (how students mix them up)

These distributions can look similar because both involve repeated independent trials with a constant success probability $p$ . The key difference is what is fixed versus what is random:

In a binomial problem, you know $n$ ahead of time and stop after $n$ trials.
In a geometric problem, you don’t know how many trials will occur; you stop when the first success happens.

A quick “language test” that helps:

If the question says “in 20 trials…” or “out of 15 people…,” that suggests binomial.
If the question says “until the first…” or “how many until…,” that suggests geometric.

Exam Focus

Typical question patterns:
- “Define $X$ as trials until first success; compute $P(X=k)$ , $P(X\le k)$ , or $P(X>k)$ .”
- “Show why a scenario is geometric (binary outcomes, independence, constant $p$ ), then calculate probabilities.”
- “Use the memoryless property in a conditional probability question.”
Common mistakes:
- Treating a geometric question as binomial by incorrectly choosing a fixed $n$ .
- Off-by-one errors in $P(X=k)=(1-p)^{k-1}p$ (using exponent $k$ instead of $k-1$ ).
- Using a calculator’s geometric function without checking whether it counts trials or failures.