Probability Basics
Looking Back & Moving Forward
We can use results from a sample to infer about a population.
Example: An SRS (Simple Random Sample) of 100 college students found 34% love The Weeknd.
However, it is possible for our estimate to be wrong.
Example: Another SRS of 100 college students found 56% love The Weeknd.
Random sampling helps reduce bias, but estimates will still differ due to random variability.
If variability is too large, we need probability to help us understand and express how samples behave.
Module Objectives
Identify a sample space from a description of random phenomena.
Distinguish between independent and dependent events.
Interpret Venn diagrams, tree diagrams, and probability tables.
Apply the five basic rules of probability.
Distinguish between discrete and continuous random variables.
Calculate probabilities for discrete and continuous random variables.
Probability Terminology
Random Phenomenon: A situation involving chance leading to results called outcomes.
Unpredictable in the short run but predictable over the long run.
Example: tossing a coin
Probability: The proportion of times an outcome occurs in infinitely repeated trials.
In practice, we have only a limited number of repeated trials.
Example: [Detailed example not provided in transcript]
Law of Large Numbers: The larger the sample size (aka number of trials), the closer the observed sample probability will be to the unknown theoretical probability.
Probability Notation
Sample Space, S: Set of ___________________________ for a random phenomenon.
Example: What is the sample space for flipping a coin 3 times?
Notation P(A) represents the probability an event occurs:
P(A) = rac{ ext{#times event happens} } { ext{total #trials or #outcomes possible} }
Example: What is P(2 heads out of 3 coin tosses)?
Example 1: Observational Study
An observational study evaluated long-term complications in diabetic patients treated under two different regimens.
The researcher randomly sampled 200 patients and recorded which treatment they used as well as whether they had experienced complications of foot, eye, or cardiovascular disease.
Treatment outcomes:
Treatment
Complications
None
Total
1
11
77
88
2
9
103
112
Total
20
180
200
Define:
Let A = patient used Treatment 1,
Let B = patient experienced complications,
Let C = patient used Treatment 2 and had not experienced complications.
Outcomes in the sample space:
Estimate the following: (specific probabilities to be calculated).
Basic Probability Definitions
Union of two events & Denoted U: Either ___ occurs.
Intersection of two events & Denoted ∩: Both ____ happen at the same time.
Disjoint Events: ________ happen at the same time; they have no intersection.
Complement of an event: Denoted __; the event ______________.
Basic Probability Rules
For any event A:
For a _____% chance of event P(A) = 0 (event will not happen).
For a _____% chance of event P(A) = 1 (event will certainly happen).
If S is the sample space, then P(S) = 1.
Addition Rule: In general,
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
If A and B are ______________, then P(A ∩ B) = 0.
Complement Rule:
P(A') = P(A does not occur) = 1 − P(A).
Example 2: Diabetes Study Probabilities
Recall the results of the diabetes study from Example 1:
Let A = patient used Treatment 1,
Let B = patient experienced complications,
Let C = patient used Treatment 2 and had not experienced complications.
Estimate the following probabilities: P(A ∩ B), P(A ∪ B), P(A').
Conditional Probability
In conditional probability calculations, the value of one variable or outcome of one trial is ______ or _______.
Restricts the sample space and reduces the total number of possible outcomes.
Changes the _______________ of the fraction.
When B occurs, the conditional probability of A given B is:
P(A | B) =
Multiplication Rule:
In general, P(A and B) = P(A) ∗ P(B|A).
Example 3: Taxpayer Income Level Audit
For the 80.2 million long-form federal tax returns received by the IRS, we cross-tabulate taxpayer income level with whether they were audited.
Frequencies reported in thousands and rounded.
Calculate the conditional probabilities from the table:
P(A|Income < $25K), P(B|$50K ≤ Income < $100K), P(C|Income ≥ $100K).
| Income Level | Audited | Not Audited | Total |
|------------------|---------|-------------|--------|
| Under $25K | 90 | 14010 | 14100 |
| $25K to $49K | 71 | 30629 | 30700 |
| $50K to $99K | 69 | 24631 | 24700 |
| $100K or more | 80 | 10620 | 10700 |
| Total | 310 | 79890 | 80200 |
Example 4: Accuracy of COVID-19 Test
Study used a sample of 50 people known to have COVID-19 and 9950 known not to have it to test the accuracy of an antigen test:
Two possible errors with diagnostic tests:
False Positive,
False Negative.
Given a positive test, what is the probability the person actually has COVID-19?
Given a negative test, what is the probability a person actually has COVID-19?
| | Positive | Negative | Total |
|-------------------|----------|----------|--------|
| COVID-19 | 26 | 24 | 50 |
| None | 40 | 9910 | 9950 |
| Total | 66 | 9934 | 10000 |
Independence
Suppose 5 subjects in a population of 100 are unusual. If we randomly sample 3 subjects without replacement, what is the probability all 3 will be unusual?
These trials are ____________________.
Two events A and B are independent if:
P(A | B) = P(A).
Event A has ___________ on event B.
Multiplication Rule:
If A and B are independent, then P(A and B) = P(A) ∗ P(B).
Example 5: Gender and Eye Color
Gender and eye color observed in a sample.
Let A = blue eyes and B = female. Are A and B independent?
Gender
Blue Eyes
Other Color
Total
Female
3
12
15
Male
4
16
20
Total
7
28
35
Let A = color blind and B = male. Are A and B independent?
Gender
Color Blind
Not Color Blind
Total
Female
1
14
15
Male
3
17
20
Total
4
31
35
Tree Diagrams
Probability often requires combining several basic rules into elaborate calculations; tree diagrams help us visualize and simplify the math.
The sum of the probabilities emanating from any branch is ____.
Final outcomes are __________.
To find probability for a final outcome, __________________________ that branch.
Example 6: Skin Cancer Probabilities
The tree diagram shows probabilities of skin cancer by body locations and gender.
Required calculations: P(A ∩ B), P(A|B), P(C).
Outcomes based on tree structure regarding numbers of persons by gender and cancer location.
Summary of Basic Probability Rules
Law of Large Numbers: As n→∞, the closer the sample probability will be to the theoretical probability.
Five Basic Probability Rules for Events A and B:
.
For sample space, P(S) = 1.
Complement rule: P(A') = 1−P(A).
Addition rule: P(A ∪ B) = P(A) + P(B)−P(A ∩ B). If A and B are disjoint, P(A ∩ B) = 0.
Multiplication rule: P(A ∩ B) = P(A)∗P(B|A).
Conditional probability: P(A|B) = . If A and B are independent, P(A|B) = P(A).
Tree diagrams simplify complex probability.
Moving from Categorical to Quantitative Data
Every example in this module so far has been based on categorical data.
We learned the basic probability rules and applied them to various univariate and bivariate scenarios.
Calculated probabilities by adding, subtracting, multiplying, and dividing.
Now we will look at examples based on quantitative data.
The same basic probability rules apply, but they will look slightly different in their implementation.
We will learn to calculate means and variances.
For one sub-type of quantitative variables, we will calculate probabilities by finding areas.
Random Variables
Values of a random variable represent __________ outcomes of a random phenomenon.
Capital letters refer to the ____________ itself.
Lowercase letters refer to possible ________ of the variable.
Two types of random variables:
Discrete : Values that can take distinct counts.
Continuous: Values that can take on any value within a range.
The probability distribution of a random variable indicates what values are __________ and their associated _______________.
Probability Rules for Random Variables
Let a and b be specific numbers with a < b:
P(X ≤ a) = P(X < a) + P(X = a).
P(X < a or X > b) = P(X < a) + P(X > b).
P(X > b) = 1 − P(X ≤ b).
P(a < X < b) = 1 − P(X ≤ a or X ≥ b) = 1 − [P(X ≤ a) + P(X ≥ b)] = 1 − P(X ≤ a) − P(X ≥ b).
Probability Distributions for Discrete Random Variables
Discrete random variables have a finite list of possible outcomes.
A probability model lists all possible outcomes with their associated probabilities, where:
Find the probability for any event by __________ probabilities of the individual outcomes that make up the event.
Example 7: Age Probability Distribution
Suppose we have a population of 15 people with the following ages:
13, 14, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18.
Calculate the probability distribution for X (the age of a randomly chosen person):
Relevant calculations for P(X > 16), P(X ≤ 16), P(X ≤ 14 ∪ X ≥ 18).
Mean and Variance for Discrete Random Variables
For a variable X:
Mean (expected value): Multiply each value by its probability; then sum all the products:
Variance: Subtract the mean from each possible value, square the result, multiply by the corresponding probability, then add all the products:
Example 8: Hearing Impairment in Dalmatians
A study examined hearing impairment in 5333 Dalmatians.
Let X = the number of ears impaired in a randomly chosen Dalmatian:
What is the mean of X?
What is the variance of X?
What is the standard deviation of X?
| x | 0 | 1 | 2 |
|---|---|---|---|
| P(X) | 0.70 | 0.22 | 0.08 |
Probability Distributions for Continuous Random Variables
Continuous variables can take on __________ possible outcomes.
We calculate probabilities by finding the ________ under a _____________ curve.
A density curve:
Is fitted to the bars of a histogram,
Displays the overall distribution pattern,
Is always on or above the horizontal axis,
Has total area exactly _____ underneath it.
Probabilities are assigned to __________ of values.
In fact, for continuous variables, if X is __________ and c is any constant, then:
P(X = c) = 0,
P(X ≤ c) = P(X < c).
Example 9: Uniform Distribution
Suppose we have a uniform distribution over the interval from 0 to 5.
Calculate probabilities for: P(X ≤ 2), P(1 ≤ X ≤ 3), P(X < 1 ∪ X > 3).
Summary of Probability for Random Variables
Two types of random variables:
Discrete: Find probabilities by adding and subtracting.
Continuous: Find probability by computing area.
Implementation of basic probability rules:
P(X ≤ a) = P(X < a) + P(X = a).
P(X < a or X > b) = P(X < a) + P(X > b).
P(X > b) = 1 − P(X ≤ b).
For discrete X, Mean of X = .
Variance of X = .
For continuous X, if c is constant, P(X = c) = 0.