Chapter 4: Probability Theory Basics — Random Experiments, Sample Space, and Counting Rules

Random Experiments and Probability Basics

This transcript introduces inferential statistics as a transition from frequency distributions to probability distributions.
Probability is described as distributing the “ball of clay” (the total probability mass, equal to 1) across all possible outcomes of an experiment.
The ball of clay metaphor helps visualize how probability mass is allocated to all potential events that could occur.
Emphasis on the distinction between frequency distributions (describing observed data) and probability distributions (describing all possible outcomes and their probabilities).
Chapter four focuses on basic probability theory before diving into probability distributions in later chapters; Bayes’ theorem is noted as more challenging but potentially fun.

Key Definitions and Concepts

Random experiment: A process that generates well-defined outcomes. Example: Opening a lemonade stand and counting how many lemonades are sold.
Outcome: A single result that can occur from an experiment (e.g., “sold 8 lemonades”).
Sample point: One of the outcomes that could happen in the experiment.
Sample space: The set of all possible outcomes (all sample points).
Event: A subset of the sample space (a collection of sample points that share a property, e.g., getting at least two heads in three coin flips).
Probability distribution vs frequency distribution:
- Frequency distribution describes observed frequencies.
- Probability distribution assigns probabilities to all possible outcomes and their likelihoods; the sum of all probabilities is 1.
Probabilities are bounded: a probability must lie between 0 and 1 (inclusive). 0 corresponds to 0% chance, 1 corresponds to 100% chance.
Notation recap:
- Intersection: $P(A \cap B)$
- Conditional probability: $P(B|A)$ (probability of B given A)
- Event vs sample point: An event is a set of sample points; a sample point is a single outcome.
- Sum of probabilities: $\sum P(s_i) = 1$ over all sample points in the sample space.
- Membership: The symbol $\in$ means “is an element of” (e.g., an outcome that belongs to an event).

Basic Probability Rules (Axioms and Notation)

Probability bounds: For every sample point si, $0 \le P(si) \le 1$
Normalization: The probabilities across all sample points sum to 1:
$\sum{i} P(si) = 1$
Event probability: The probability of an event E (a subset of the sample space) is the sum of the probabilities of the sample points in E:
$P(E) = \sum{si \in E} P(s_i)$
Conditional probability intuition: The probability of B given A focuses on the portion of the sample space where A occurred:
P(B|A) = \frac{P(A \cap B)}{P(A)} \quad \text{(provided } P(A) > 0)
The little line in the transcript denotes “given that” in conditional probability.
The asterisk/ summation symbol: $\Sigma$ (capital sigma) denotes summation over a set of terms.

Simple Probability Examples

Fair die (six-sided): sample space ({1,2,3,4,5,6}), each with probability $P(i) = \frac{1}{6}$
- Probability of rolling less than 3:
 P(\text{<3}) = \frac{2}{6} = \frac{1}{3}
- Probability of rolling less than 4:
 P(\text{<4}) = \frac{3}{6} = \frac{1}{2}
- Probability of rolling greater than 4:
 P(\text{>4}) = \frac{2}{6} = \frac{1}{3}
Three fair coins flipped:
- Sample space size: $2^3 = 8$ outcomes.
- Event: getting at least two heads (i.e., 2 or 3 heads).
- There are 4 favorable outcomes (HHH, HHt, HtH, tHH). Hence:
  $P(\text{at least 2 heads}) = \frac{4}{8} = \frac{1}{2}$
A two-step experiment: coin flip (2 outcomes) followed by a die roll (6 outcomes)
- Total outcomes: $2 \times 6 = 12$
- Example: probability of rolling two sixes with two dice is computed as a separate case (1 favorable outcome out of 36 total outcomes for two dice): $P(\text{double six}) = \frac{1}{36}$
- Counts illustrate the multistep counting rule below.

Counting Rules: Multistep Experiments

For a k-step experiment with NI possible results at step i, the total number of possible outcomes is the product:
$\text{Total outcomes} = \prod{i=1}^{k} Ni$
Examples:
- Flip three coins: each step has 2 outcomes ⇒ total outcomes = $2^3 = 8$
- Flip four coins: total outcomes = $2^4 = 16$
- Coin then die: first step has 2 outcomes, second step has 6 outcomes ⇒ total outcomes = $2\times 6 = 12$
Key takeaway: Counting rules let us convert a story into a count of equally likely outcomes, which then allows probability calculations.

Counting Rules: Combinations (n choose k)

Problem: From a set of capital N objects, how many distinct groups of size little n can be formed when order does not matter?
Notation (two equivalent forms):
- English-language form: how many groups of $\text{little } n$ from $\text{capital } N$
- Mathematical shorthand: $\binom{N}{n}$ or $(N\choose n)$
Definition via factorials:
$\binom{N}{n} = \frac{N!}{n!(N-n)!}$
Important related concepts:
- Factorial: $N! = N \times (N-1) \times \cdots \times 2 \times 1$
- Special case: $0! = 1$
Worked example: 5 objects choosing 2 at a time
- $\binom{5}{2} = \frac{5!}{2!\cdot (5-2)!} = \frac{120}{2\cdot 6} = 10$
Practical card example: How many 5-card hands from a standard deck? $\binom{52}{5}$
Real-world-story example (20 kids, 5 to form a team):
- Number of possible 5-person groups: $\binom{20}{5} = 15504$
- If one specific set of 5 kids (your 5 kids) is just one of these possibilities, the probability that your 5 kids are on the randomly formed team is:
  $P(\text{your 5 kids on the team}) = \frac{1}{\binom{20}{5}} = \frac{1}{15504} \approx 6.45\times 10^{-5}$
Quick tips mentioned in the transcript:
- There are two common ways to write the combination notation; both convey the same concept.
- Use factorial simplifications to cancel terms when calculating large combinations.

Putting It Together: What This Lets You Do

You can turn a narrative problem into a probability problem by:
- Identifying the experiment and its steps, the sample space, and the sample points.
- Using the multistep counting rule to determine the total number of outcomes.
- Defining the event of interest and summing the probabilities of the relevant sample points (or using combinations if each outcome is equally likely).
In many problems, you’ll use: $P(E) = \sum{si \in E} P(s_i)$ when outcomes are not equally likely, or
- If outcomes are equally likely, $P(E) = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}} = \frac{|E|}{|S|}$ where S is the sample space.

Connections to Other Topics and Practical Implications

Relationship to inferential statistics: This chapter lays the groundwork for distributing probability across all possible events, which is essential when generalizing from a sample to a population.
The contrast with frequency distributions: Frequency distributions describe observed data; probability distributions describe the likelihood of various outcomes in a theoretical or long-run sense.
Prelude to probability distributions in later chapters (chapters 5–7) and Bayes’ theorem (often considered more challenging but illuminating).
Foundational math tools needed:
- Factorials and combinations (n!), (N choose n) notation
- Basic probability axioms and conditional probability
Real-world relevance: Examples include predicting purchases in a store, evaluating outcomes of games of chance, and understanding how likely certain team selections are when groups are formed randomly.

Practical Tips and Tutor Notes

Memorize the core formulas and the logic behind them, but expect to derive them in problems rather than memorize every detail.
If a quiz or test offers a formula sheet, recognize the underlying concepts rather than rote memorization of the sheet.
When evaluating problems, distinguish between:
- Word problems requiring counting (use counting rules)
- Problems with non-uniform probabilities (directly sum probabilities of relevant sample points)
Common pitfalls to watch for:
- Misreading inequalities (e.g.,
- Mixing up order in combinations vs. permutations (the combination formula assumes order does not matter)
- Assuming all sample points are equally likely unless stated otherwise

Ethical and Practical Implications Mentioned in the Transcript

The instructor humorously points out that achieving a perfect score on the first attempt of a quiz might indicate looking up answers (Chegg) rather than doing the work.
Two attempts are allowed to encourage practice and learning rather than reliance on answer keys.
The approach emphasizes building understanding and the use of a formula sheet as a guide, not a substitute for learning.

Quick Reference Formulas (LaTeX)

Probability of B given A: $P(B|A) = \frac{P(A \cap B)}{P(A)}$
Sum of probabilities over sample space: $\sum{i} P(si) = 1$
Event probability: $P(E) = \sum{si \in E} P(s_i)$
Combination: $\binom{N}{n} = \frac{N!}{n!(N-n)!}$
Factorial: $N! = N \times (N-1) \times \cdots \times 2 \times 1, \quad 0! = 1$
Example probabilities with a die:
- P(\text{<3}) = \frac{2}{6} = \frac{1}{3}
- P(\text{<4}) = \frac{3}{6} = \frac{1}{2}
- P(\text{>4}) = \frac{2}{6} = \frac{1}{3}
Example: At least two heads with three fair coins:
- Sample space size: $2^3 = 8$
- Favorable outcomes: 4
- Hence: $P(\text{at least 2 heads}) = \frac{4}{8} = \frac{1}{2}$
Example: 20 choose 5 (team selection) and probability:
- Total groups: $\binom{20}{5} = 15504$
- Favorable outcome: 1 (your specific 5 kids)
- Probability: $P = \frac{1}{\binom{20}{5}} = \frac{1}{15504}$