Probability Basics

Looking Back & Moving Forward

  • We can use results from a sample to infer about a population.

    • Example: An SRS (Simple Random Sample) of 100 college students found 34% love The Weeknd.

    • However, it is possible for our estimate to be wrong.

    • Example: Another SRS of 100 college students found 56% love The Weeknd.

  • Random sampling helps reduce bias, but estimates will still differ due to random variability.

  • If variability is too large, we need probability to help us understand and express how samples behave.

Module Objectives

  • Identify a sample space from a description of random phenomena.

  • Distinguish between independent and dependent events.

  • Interpret Venn diagrams, tree diagrams, and probability tables.

  • Apply the five basic rules of probability.

  • Distinguish between discrete and continuous random variables.

  • Calculate probabilities for discrete and continuous random variables.

Probability Terminology

  • Random Phenomenon: A situation involving chance leading to results called outcomes.

    • Unpredictable in the short run but predictable over the long run.

    • Example: tossing a coin

  • Probability: The proportion of times an outcome occurs in infinitely repeated trials.

    • In practice, we have only a limited number of repeated trials.

    • Example: [Detailed example not provided in transcript]

  • Law of Large Numbers: The larger the sample size (aka number of trials), the closer the observed sample probability will be to the unknown theoretical probability.

Probability Notation

  • Sample Space, S: Set of ___________________________ for a random phenomenon.

    • Example: What is the sample space for flipping a coin 3 times?

  • Notation P(A) represents the probability an event occurs:

    • P(A) = rac{ ext{#times event happens} } { ext{total #trials or #outcomes possible} }

    • Example: What is P(2 heads out of 3 coin tosses)?

Example 1: Observational Study

  • An observational study evaluated long-term complications in diabetic patients treated under two different regimens.

  • The researcher randomly sampled 200 patients and recorded which treatment they used as well as whether they had experienced complications of foot, eye, or cardiovascular disease.

  • Treatment outcomes:

    Treatment

    Complications

    None

    Total


    1

    11

    77

    88


    2

    9

    103

    112


    Total

    20

    180

    200

    • Define:

    • Let A = patient used Treatment 1,

    • Let B = patient experienced complications,

    • Let C = patient used Treatment 2 and had not experienced complications.

  • Outcomes in the sample space:

  • Estimate the following: (specific probabilities to be calculated).

Basic Probability Definitions

  • Union of two events & Denoted U: Either ___ occurs.

  • Intersection of two events & Denoted ∩: Both ____ happen at the same time.

  • Disjoint Events: ________ happen at the same time; they have no intersection.

  • Complement of an event: Denoted __; the event ______________.

Basic Probability Rules

  • For any event A: 0extP(A)ext10 ext{≤} P(A) ext{≤} 1

    • For a _____% chance of event P(A) = 0 (event will not happen).

    • For a _____% chance of event P(A) = 1 (event will certainly happen).

  • If S is the sample space, then P(S) = 1.

  • Addition Rule: In general,

    • P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

    • If A and B are ______________, then P(A ∩ B) = 0.

  • Complement Rule:

    • P(A') = P(A does not occur) = 1 − P(A).

Example 2: Diabetes Study Probabilities

  • Recall the results of the diabetes study from Example 1:

    • Let A = patient used Treatment 1,

    • Let B = patient experienced complications,

    • Let C = patient used Treatment 2 and had not experienced complications.

  • Estimate the following probabilities: P(A ∩ B), P(A ∪ B), P(A').

Conditional Probability

  • In conditional probability calculations, the value of one variable or outcome of one trial is ______ or _______.

    • Restricts the sample space and reduces the total number of possible outcomes.

    • Changes the _______________ of the fraction.

  • When B occurs, the conditional probability of A given B is:

    • P(A | B) = racP(AextandB)P(B)rac{P(A ext{ and } B)}{P(B)}

  • Multiplication Rule:

    • In general, P(A and B) = P(A) ∗ P(B|A).

Example 3: Taxpayer Income Level Audit

  • For the 80.2 million long-form federal tax returns received by the IRS, we cross-tabulate taxpayer income level with whether they were audited.

  • Frequencies reported in thousands and rounded.

  • Calculate the conditional probabilities from the table:

    • P(A|Income < $25K), P(B|$50K ≤ Income < $100K), P(C|Income ≥ $100K).
      | Income Level | Audited | Not Audited | Total |
      |------------------|---------|-------------|--------|
      | Under $25K | 90 | 14010 | 14100 |
      | $25K to $49K | 71 | 30629 | 30700 |
      | $50K to $99K | 69 | 24631 | 24700 |
      | $100K or more | 80 | 10620 | 10700 |
      | Total | 310 | 79890 | 80200 |

Example 4: Accuracy of COVID-19 Test

  • Study used a sample of 50 people known to have COVID-19 and 9950 known not to have it to test the accuracy of an antigen test:

    • Two possible errors with diagnostic tests:

    • False Positive,

    • False Negative.

    • Given a positive test, what is the probability the person actually has COVID-19?

    • Given a negative test, what is the probability a person actually has COVID-19?
      | | Positive | Negative | Total |
      |-------------------|----------|----------|--------|
      | COVID-19 | 26 | 24 | 50 |
      | None | 40 | 9910 | 9950 |
      | Total | 66 | 9934 | 10000 |

Independence

  • Suppose 5 subjects in a population of 100 are unusual. If we randomly sample 3 subjects without replacement, what is the probability all 3 will be unusual?

    • These trials are ____________________.

  • Two events A and B are independent if:

    • P(A | B) = P(A).

  • Event A has ___________ on event B.

  • Multiplication Rule:

    • If A and B are independent, then P(A and B) = P(A) ∗ P(B).

Example 5: Gender and Eye Color


  • Gender and eye color observed in a sample.


  • Let A = blue eyes and B = female. Are A and B independent?

    Gender

    Blue Eyes

    Other Color

    Total

    Female

    3

    12

    15

    Male

    4

    16

    20

    Total

    7

    28

    35


  • Let A = color blind and B = male. Are A and B independent?

    Gender

    Color Blind

    Not Color Blind

    Total


    Female

    1

    14

    15


    Male

    3

    17

    20


    Total

    4

    31

    35

    Tree Diagrams

    • Probability often requires combining several basic rules into elaborate calculations; tree diagrams help us visualize and simplify the math.

    • The sum of the probabilities emanating from any branch is ____.

    • Final outcomes are __________.

    • To find probability for a final outcome, __________________________ that branch.

    Example 6: Skin Cancer Probabilities

    • The tree diagram shows probabilities of skin cancer by body locations and gender.

    • Required calculations: P(A ∩ B), P(A|B), P(C).

    • Outcomes based on tree structure regarding numbers of persons by gender and cancer location.

    Summary of Basic Probability Rules

    • Law of Large Numbers: As n→∞, the closer the sample probability will be to the theoretical probability.

    • Five Basic Probability Rules for Events A and B:

      • 0P(A)10≤P(A)≤1.

      • For sample space, P(S) = 1.

      • Complement rule: P(A') = 1−P(A).

      • Addition rule: P(A ∪ B) = P(A) + P(B)−P(A ∩ B). If A and B are disjoint, P(A ∩ B) = 0.

      • Multiplication rule: P(A ∩ B) = P(A)∗P(B|A).

      • Conditional probability: P(A|B) = racP(AextandB)P(B)rac{P(A ext{ and } B)}{P(B)}. If A and B are independent, P(A|B) = P(A).

      • Tree diagrams simplify complex probability.

    Moving from Categorical to Quantitative Data

    • Every example in this module so far has been based on categorical data.

    • We learned the basic probability rules and applied them to various univariate and bivariate scenarios.

    • Calculated probabilities by adding, subtracting, multiplying, and dividing.

    • Now we will look at examples based on quantitative data.

    • The same basic probability rules apply, but they will look slightly different in their implementation.

    • We will learn to calculate means and variances.

    • For one sub-type of quantitative variables, we will calculate probabilities by finding areas.

    Random Variables

    • Values of a random variable represent __________ outcomes of a random phenomenon.

    • Capital letters refer to the ____________ itself.

    • Lowercase letters refer to possible ________ of the variable.

    • Two types of random variables:

      • Discrete : Values that can take distinct counts.

      • Continuous: Values that can take on any value within a range.

    • The probability distribution of a random variable indicates what values are __________ and their associated _______________.

    Probability Rules for Random Variables

    • Let a and b be specific numbers with a < b:

      • P(X ≤ a) = P(X < a) + P(X = a).

      • P(X < a or X > b) = P(X < a) + P(X > b).

      • P(X > b) = 1 − P(X ≤ b).

      • P(a < X < b) = 1 − P(X ≤ a or X ≥ b) = 1 − [P(X ≤ a) + P(X ≥ b)] = 1 − P(X ≤ a) − P(X ≥ b).

    Probability Distributions for Discrete Random Variables

    • Discrete random variables have a finite list of possible outcomes.

    • A probability model lists all possible outcomes with their associated probabilities, where:

      • p1+p2++pn=1p_1 + p_2 + … + p_n = 1

    • Find the probability for any event by __________ probabilities of the individual outcomes that make up the event.

    Example 7: Age Probability Distribution

    • Suppose we have a population of 15 people with the following ages:

      • 13, 14, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18.

    • Calculate the probability distribution for X (the age of a randomly chosen person):

      • Relevant calculations for P(X > 16), P(X ≤ 16), P(X ≤ 14 ∪ X ≥ 18).

    Mean and Variance for Discrete Random Variables

    • For a variable X:

      • Mean (expected value): Multiply each value by its probability; then sum all the products:

      • E(X)=extMean=racX1p1+X2p2+extn=extstyleracextSumnE(X) = ext{Mean} = rac{X_1p_1 + X_2p_2 + ext{…}}{n} = extstyle rac{ ext{Sum}}{n}

      • Variance: Subtract the mean from each possible value, square the result, multiply by the corresponding probability, then add all the products:

      • Var(X)=racextSumof((XE(X))2P(X))nVar(X) = rac{ ext{Sum of }((X - E(X))^2⋅P(X))}{n}

    Example 8: Hearing Impairment in Dalmatians

    • A study examined hearing impairment in 5333 Dalmatians.

    • Let X = the number of ears impaired in a randomly chosen Dalmatian:

      • What is the mean of X?

      • What is the variance of X?

      • What is the standard deviation of X?
        | x | 0 | 1 | 2 |
        |---|---|---|---|
        | P(X) | 0.70 | 0.22 | 0.08 |

    Probability Distributions for Continuous Random Variables

    • Continuous variables can take on __________ possible outcomes.

    • We calculate probabilities by finding the ________ under a _____________ curve.

    • A density curve:

      • Is fitted to the bars of a histogram,

      • Displays the overall distribution pattern,

      • Is always on or above the horizontal axis,

      • Has total area exactly _____ underneath it.

    • Probabilities are assigned to __________ of values.

      • In fact, for continuous variables, if X is __________ and c is any constant, then:

      • P(X = c) = 0,

      • P(X ≤ c) = P(X < c).

    Example 9: Uniform Distribution

    • Suppose we have a uniform distribution over the interval from 0 to 5.

    • Calculate probabilities for: P(X ≤ 2), P(1 ≤ X ≤ 3), P(X < 1 ∪ X > 3).

    Summary of Probability for Random Variables

    • Two types of random variables:

      • Discrete: Find probabilities by adding and subtracting.

      • Continuous: Find probability by computing area.

    • Implementation of basic probability rules:

      • P(X ≤ a) = P(X < a) + P(X = a).

      • P(X < a or X > b) = P(X < a) + P(X > b).

      • P(X > b) = 1 − P(X ≤ b).

      • For discrete X, Mean of X = E(X)=extstyleracextSumnE(X) = extstyle rac{ ext{Sum}}{n}.

      • Variance of X = Var(X)=extSumofprices2/nVar(X) = ext{Sum of prices}^{2}/n.

      • For continuous X, if c is constant, P(X = c) = 0.