Probability and Types of Events in Statistics

Course Administration and SPSS Update

FAQ Forum: Students are encouraged to post questions or struggling topics on the FAQ forum so that the instructor can create content covering those areas. No posts have been made yet.
Standard Deviation: A video explaining the concept of standard deviation will be posted.
SPSS Licensing Issues:
- The university has only $78$ concurrent licenses for SPSS, which is insufficient for a large student body (approximately $40,000$ students).
- Solutions:
  1. Single Device Access: Students must ensure they are only trying to access SPSS on one device at a time. Using multiple devices simultaneously consumes additional licenses.
  2. Partner Work: If single device access doesn't resolve the issue, students should work in partners (or small groups of $2-3$ ) on one computer, sharing a single SPSS license and collaborating on assignments.
SPSS Exam: The SPSS exam will involve using SPSS. The focus will be on understanding what SPSS does (conceptual understanding), not on memorizing how to use specific commands or syntax.

Introduction to Probability

Definition: Probability is the likelihood that a particular event or outcome will occur. It is used to predict outcomes, but these are not certainties ( $100\%$ guarantee).
Everyday Examples:
- Rain Forecast: A $30\%$ chance of rain is a probability, not a certainty. The actual outcome may differ.
- Gestational Diabetes by Maternal Age (Canada Data):
  - The risk of gestational diabetes increases with maternal age.
  - For pregnant individuals aged $40$ and over, approximately $12\%$ develop gestational diabetes.
  - This indicates a higher risk for individuals in older age groups, but does not guarantee an individual will develop it. We are trying to predict outcomes based on group probabilities.
- Rolling a Die: The probability of rolling a one on a standard six-sided die is $1/6$ . This is a population-level probability, not a guarantee for any small number of rolls.
Medical/Test-Related Probabilities:
- Sensitivity: The probability that a test will correctly identify a truly sick individual as positive (true positive).
  - Example: If a person really has COVID, what is the chance their COVID test says "positive" (yes, you have it)?
  - We want sensitivity to be high; a low sensitivity (e.g., $15\%$ ) means many false negatives (sick people testing negative).
  - This concept is related to Type I and Type II errors in hypothesis testing, which will be covered later.
- Specificity: The probability that a test will correctly identify a truly healthy individual as negative (true negative).
  - Example: If a person really does not have COVID, what is the chance their COVID test says "negative" (no, you don't have it)?
  - This was a concern during the early stages of COVID to avoid false negatives (a person testing negative but actually being sick).
- There are often trade-offs between sensitivity and specificity in test design. Both are expressed as probabilities.

Formal Statistical Perspective

Random Event: Any event where more than one outcome is possible.
Sample Space: A comprehensive list of all possible outcomes for a random event.
- Example 1: Sex of a Newborn: Sample space = {Male, Female}. (Acknowledged rare cases of ambiguous genitalia are typically excluded for simplification).
- Example 2: Flipping a Coin $3$ Times: Sample space = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}. There are $2^3 = 8$ possible outcomes.
Calculating the Probability of an Event ( $P(x)$ ):
- $P(x) = \frac{\text{Number of ways event x can happen}}{\text{Total number of possible outcomes (size of sample space)}}$
- Example 1: Probability of a Baby Girl: Sample space has $2$ outcomes. One way to get a girl. $P(\text{Girl}) = 1/2 = 0.5$ or $50\%$ .
- Example 2: Probability of at Least Two Heads in $3$ Coin Flips:
  - Outcomes with at least two heads: {HHH, HHT, HTH, THH} (which is $4$ ways).
  - Total outcomes: $8$ .
  - $P(\text{at least two heads}) = 4/8 = 0.5$ or $50\%$ .
Probability Range (Math Check):
- Probabilities are always between $0$ and $1$ , inclusive. (i.e., $0 \le P(x) \le 1$ ).
- Probabilities are always positive.
- A probability closer to $1$ means the event is more likely; closer to $0$ means less likely.
- $P(x) = 1.0$ means the event is certain to happen.
- If any calculation yields a probability greater than $1$ or less than $0$ , an error has been made.

Types of Events

Independent Events: The occurrence of one event has no influence on the probability of subsequent events.
- Example 1: Coin Flips: Each coin flip is independent of the previous ones.
- Example 2: Drawing Cards with Replacement: If a card is drawn and then put back into the deck before drawing again, the draws are independent.
- Example 3: Unrelated Purchases: One person in Calgary buying an iPhone does not influence another person in Ottawa buying one (if they are unrelated).
Dependent Events: The occurrence of one event does influence the probability of subsequent events.
- Example 1: Drawing Cards Without Replacement: If a card is drawn and not replaced, the probabilities for subsequent draws change.
  - $P(\text{King on 1st draw}) = 4/52$ .
  - $P(\text{King on 2nd draw given 1st was a King and not replaced}) = 3/51$ .
- Example 2: Influenced Purchases: A person buying an iPhone and telling their brother about it, leading the brother to buy one, is a dependent event.
Mutually Exclusive Events: Two events are mutually exclusive if they cannot occur at the same time. The occurrence of one precludes the occurrence of the other. The probability of both occurring is zero ( $P(A \text{ and } B) = 0$ ).
- Example 1: Height: A person cannot be both taller than six feet and shorter than six feet simultaneously.
- Example 2: Drawing an Ace vs. a Non-Ace: If you draw an Ace, it cannot be a non-Ace.
- Venn Diagram: Mutually exclusive events are represented by separate, non-overlapping circles.
Not Mutually Exclusive Events: Two events are not mutually exclusive if they can occur at the same time; there is an overlap.
- Example: Drawing an Ace or a Heart from a Deck:
  - There are cards that are Aces but not Hearts (e.g., Ace of Spades).
  - There are cards that are Hearts but not Aces (e.g., Two of Hearts).
  - There is a card that is both an Ace and a Heart (Ace of Hearts).
- Venn Diagram: Represented by overlapping circles, where the overlap signifies the co-occurrence of both events.
  - $P(\text{Ace}) = 4/52$ .
  - $P(\text{Heart}) = 13/52$ .
  - $P(\text{Ace of Hearts}) = 1/52 \approx 0.02$ or $2\%$ .
Exhaustive Events: A set of events is exhaustive if it includes all possible outcomes in the sample space. The sum of their probabilities equals $1$ . This requires the events to also be mutually exclusive.
- Example: Suits in a Deck of Cards: Drawing a Heart, Club, Spade, or Diamond comprises all possibilities.
  - $P(\text{Heart}) + P(\text{Club}) + P(\text{Spade}) + P(\text{Diamond}) = 13/52 + 13/52 + 13/52 + 13/52 = 52/52 = 1$ .
- If events are not mutually exclusive, adding their individual probabilities will result in a sum greater than $1$ due to double-counting.
Relative Frequency: Another term for how often an event is expected to occur in relation to all possibilities. It is the practical interpretation of probability.

Calculating Probabilities for Multiple Events

Complementary Events: Two events, A and B, are complementary if they are mutually exclusive and exhaustive. This means one must occur if the other does not.
- $P(A) + P(B) = 1$ .
- Therefore, $P(B) = 1 - P(A)$ .
- Example: Ace vs. Non-Ace Draw: $P(\text{Ace}) = 4/52$ . Then $P(\text{Non-Ace}) = 1 - 4/52 = 48/52$ . The sum is $4/52 + 48/52 = 52/52 = 1$ .
Additive Rule (for "OR" statements): Used when calculating the probability of either event A or event B occurring.
- For Mutually Exclusive Events: $P(A \text{ or } B) = P(A) + P(B)$ .
Multiplicative Rule (for "AND" statements): Used when calculating the probability of both event A and event B occurring.
- For Independent Events: $P(A \text{ and } B) = P(A) \times P(B)$ .
  - Example: Drawing two 10s with replacement: $P(\text{10 then 10}) = (4/52) \times (4/52)$ .
- For Dependent Events (e.g., without replacement): $P(A \text{ and } B) = P(A) \times P(B|A)$ , where $P(B|A)$ is the conditional probability of B given A.
  - Example: Drawing two 10s without replacement: On the first draw, $P(\text{10}) = 4/52$ . If a 10 is drawn, there are now $3$ tens left out of $51$ cards. So, $P(\text{10 then 10}) = (4/52) \times (3/51)$ .

Conditional Probabilities

Definition: The probability of an event occurring given that another event has already occurred. This indicates a dependency between the events.
Notation: $P(A | B)$ , read as "the probability of A given B."
- Example: Driver's License and Age:
  - $P(\text{License} | \text{Child}) = 0$ (A child cannot have a license).
  - $P(\text{License} | \text{Adult}) \ne 0$ (Not all adults have licenses, but it's possible).
  - $P(\text{Child} | \text{License}) = 0$ (If you have a license, you cannot be a child).
- Conditional probabilities do not simply flip around; what is given changes the sample space.
Using Tables for Conditional Probabilities (Example: Depression and Contraceptive Use):
- A study examining depression among adolescent girls:
  | | Depressed | Not Depressed | Total |
  |:---|:---:|:---:|:---:|
  | On Contraceptives | $80$ | $120$ | $200$ |
  | Not on Contraceptives | $20$ | $380$ | $400$ |
  | Total | $100$ | $500$ | $600$ |
- Overall Probabilities:
  - $P(\text{Depressed}) = 100/600 \approx 0.167$ (or $17\%$ ).
  - $P(\text{On Contraceptives}) = 200/600 \approx 0.333$ (or $33\%$ ).
- Conditional Probabilities:
  - $P(\text{Depressed } | \text{ On Contraceptives}) = 80/200 = 0.40$ (or $40\%$ ). This means among those on contraceptives, $40\%$ are depressed.
  - $P(\text{Depressed } | \text{ Not on Contraceptives}) = 20/400 = 0.05$ (or $5\%$ ). Among those not on contraceptives, $5\%$ are depressed.
  - Conclusion: Since $P(\text{Depressed } | \text{ On Contraceptives}) \ne P(\text{Depressed } | \text{ Not on Contraceptives})$ , depression is dependent on contraceptive use in this sample.
  - $P(\text{On Contraceptives } | \text{ Depressed})$ : Among those who are depressed ( $100$ total), $80$ are on contraceptives. So, $80/100 = 0.80$ (or $80\%$ ).

Bayesian Statistics (Brief Introduction)

Core Idea: While this course focuses on frequentist statistics, Bayesian statistics offers an alternative approach.
Key Feature: Bayesian statistics incorporates prior probabilities (base rates or existing knowledge from the broader population or previous research) into its calculations of conditional probabilities.
- Example: When calculating $P(\text{Depressed } | \text{ On Contraceptives})$ , a Bayesian approach would also consider the general prevalence of depression in the population (the "prior probability"), not just the sample data in isolation.
Advantage: It leverages historical and broader scientific context, not limiting conclusions purely to the current sample.
Scope: Calculations using Bayes' theorem are beyond the scope of this introductory course, but understanding its core concept and use of prior probabilities is important if pursuing further study.

Practice: GRE Score and Studying

Goal: To demonstrate that studying and GRE score are not independent (i.e., they are dependent).
Data:
High GRE (>1100)
Low GRE (<1100)
Total

Studied
$60$
$100$
$160$

Didn't Study
$40$
$100$
$140$

Total
$100$
$200$
$300$
Conditional Probabilities:
- $P(\text{High GRE } | \text{ Studied}) = 60/160 = 0.375$ (or $37.5\%$ ).
- $P(\text{High GRE } | \text{ Didn't Study}) = 40/140 \approx 0.286$ (or $28.6\%$ ).
Conclusion: Since $0.375 \ne 0.286$ , the probability of achieving a high GRE score depends (is conditional upon) whether a person studied or not. Therefore, studying for the GRE is beneficial and the events are not independent.

	High GRE (>1100)	Low GRE (<1100)	Total
Studied	$60$	$100$	$160$
Didn't Study	$40$	$100$	$140$
Total	$100$	$200$	$300$
Conditional Probabilities: