SDS CH4 - Probability Fundamentals: Collectively Exhaustive Events, Contingency Tables, and Conditional Probability

Collectively Exhaustive Events and Sample Spaces

  • Definition of Collectively Exhaustive Events: A collectively exhaustive list or set of events is one that defines every single possible outcome in the sample space. Nothing can happen that hasn't been defined in this list.
  • Example of Non-Exhaustive Events (Die Toss): If one defines the outcomes of a die toss as the set {1,2,3,4,5}\{1, 2, 3, 4, 5\}, this is not collectively exhaustive because it is missing the number 66. Since a standard die has six sides, an undefined event (rolling a six) is still possible.
  • Venn Diagram Case Study 1 (Undefined Probability):   - P(A)=0.4P(A) = 0.4   - P(B)=0.3P(B) = 0.3   - Total defined probability: 0.4+0.3=0.70.4 + 0.3 = 0.7   - This is not collectively exhaustive because there is a remaining probability of 0.30.3 in the sample space that is undefined. There is a 30%30\% chance something outside of AA or BB could occur.
  • Venn Diagram Case Study 2 (Overlapping Events and the Addition Rule):   - P(A)=0.4P(A) = 0.4   - P(B)=0.5P(B) = 0.5   - Suppose there is an undefined event floating in the sample space with a probability of 0.20.2.   - If we simply sum the probabilities (0.4+0.5+0.20.4 + 0.5 + 0.2), we get 1.11.1, which exceeds the required total sample space probability of 11.   - This happens because of double counting the intersection (overlap). To find the intersection, we use the General Addition Rule:     P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)   - If the space outside AA and BB is 0.20.2, then the union P(AB)P(A \cup B) must be 10.2=0.81 - 0.2 = 0.8.   - Setting up the equation: 0.8=0.4+0.5P(AB)0.8 = 0.4 + 0.5 - P(A \cap B).   - 0.8=0.9P(AB)0.8 = 0.9 - P(A \cap B), which means P(AB)=0.1P(A \cap B) = 0.1.
  • Visualizing Exhaustive Space: In a partitioned Venn diagram, a completely exhaustive space has no sections left undefined. In real-world scenarios, establishing a completely exhaustive space is difficult unless working with a "closed world" like coin tosses, die rolls, or a standard pack of cards.

Mutually Exclusive vs. Not Mutually Exclusive Events

  • Mutually Exclusive Events:   - Definition: Two events that cannot occur at the same time.   - Mathematical definition: The joint probability of the intersection is zero: P(AB)=0P(A \cap B) = 0.   - Visual representation: There is no overlap between circles in a Venn diagram.
  • Not Mutually Exclusive Events:   - Definition: Events that can occur simultaneously.   - Mathematical definition: The joint probability is greater than zero: P(AB)>0P(A \cap B) > 0.   - Visual representation: There is a physical overlap on a Venn diagram representing the shared probability.

Probability Approaches

1. A Priori Probability (Theoretical)

  • Calculation: P=XTP = \frac{X}{T}, where XX is the number of ways the event occurs and TT is the total possible outcomes.
  • Basis: Prior knowledge. No data gathering is required.
  • Examples:   - Coin Toss: The probability of tails is known to be 0.50.5 (or roughly 50/5050/50) for a balanced coin without needing to flip it a thousand times. In an infinite number of repetitions, the simulation would align with this theoretical value.   - Calendar Dates: The probability of a day being in January in a non-leap year is 31365\frac{31}{365}. One does not need a survey to find how many days are in January; it is known a priori.   - March 2020 Example: In the year 2020 (a leap year), selecting a day in March would involve 31366\frac{31}{366}.

2. Empirical Probability (Experimental)

  • Calculation: P=XTP = \frac{X}{T}.
  • Difference from A Priori: Method of obtaining data. Empirical probability requires gathering data through observation or surveys.
  • Example Case Study: Finding the probability of a male taking statistics.   - Data must be gathered from a specific population.   - Sample size (TT): 439439 people.   - Target event (XX): Males taking stats = 8484.   - Calculation: P=84439P = \frac{84}{439}.

3. Subjective Probability

  • Definition: A non-exact mixture of personal experience, feelings, and intuition/analysis.
  • Usage: Only used when no other data or theoretical basis is available. Generally avoided in formal academic statistics due to unreliability.
  • Hypothetical Scenario: Probability of a new ad campaign being successful.   - Media Development Team (Young): Assigns a 60%60\% probability of success due to optimism.   - Chief Media Officer (Experienced/Jaded): Assigns a 40%40\% probability of success.   - Both probabilities are filtered through subjective perceptions and intuition rather than quantifiable data.

Contingency Tables and Marginal Probabilities

  • The Setting: Using a table to track two events, for example: Planning to purchase a TV vs. Actually purchasing the TV.
  • Raw Data Counts:   - Planned and Purchased: 200200   - Planned but Did Not Purchase: 5050   - Did Not Plan but Purchased: 100100   - Did Not Plan and Did Not Purchase: 650650   - Total Sample (nn): 10001000
  • Simple Events (Marginal Probabilities):   - These involve a single event and are found in the margins (sides/bottom) of the table.   - Probability of Planning to Purchase: 250/1000=0.25250 / 1000 = 0.25.   - Probability of Actually Purchasing: 300/1000=0.30300 / 1000 = 0.30.
  • Joint Events:   - These involve the intersection of two events and are found in the body of the table.   - Planned and Purchased: 200/1000=0.20200 / 1000 = 0.20.   - Planned and Did Not Purchase: 50/1000=0.0550 / 1000 = 0.05.
  • General Rule for Marginal Probability via Joint Probability:   - To find the probability of AA, sum all its co-occurrences with various mutually exclusive and collectively exhaustive events (B1,B2,,BkB_1, B_2, \dots, B_k).   - Formula: P(A)=P(AB1)+P(AB2)++P(ABk)P(A) = P(A \cap B_1) + P(A \cap B_2) + \dots + P(A \cap B_k).   - This effectively "cancels out" the influence of event BB to find the pure probability of AA.

The Fundamental Rules of Probability

  • Range: All probabilities must be between 00 and 11 (0P10 \leq P \leq 1).   - 00 = Impossible.   - 11 = Certain.
  • Summation: The sum of all mutually exclusive and collectively exhaustive events in a sample space must equal 11.
  • General Addition Rule: P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B).   - This rule applies to any two events.   - If events are mutually exclusive, P(AB)=0P(A \cap B) = 0, so the rule simplifies to P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B).
  • Example Calculation: Find P(PlannedPurchased)P(\text{Planned} \cup \text{Purchased}) using raw table data.   - P(Planned)=250/1000P(\text{Planned}) = 250/1000   - P(Purchased)=300/1000P(\text{Purchased}) = 300/1000   - P(PlannedPurchased)=200/1000P(\text{Planned} \cap \text{Purchased}) = 200/1000   - Calculation: 2501000+30010002001000=3501000=0.35\frac{250}{1000} + \frac{300}{1000} - \frac{200}{1000} = \frac{350}{1000} = 0.35.   - Note: The union probability is higher than individual probabilities because it encompasses a larger area of interest (one or the other occurring).

Conditional Probability

  • Concept: The probability of an event given that another event has already occurred. This investigates the effect one variable has on another.
  • Notation: P(AB)P(A|B) (read as "Probability of AA given BB").
  • Formula: P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}.   - The variable being conditioned on (the event that already happened) always becomes the denominator.
  • Visual Intuition: Conditioning on BB narrows the "world" or sample space to ONLY the area where BB occurs. The numerator is then only the part of AA that exists within that new, smaller world.
  • Example Calculation: Probability of purchasing given that the person planned.   - Formula: P(PurchasedPlanned)=P(PurchasedPlanned)P(Planned)P(\text{Purchased}|\text{Planned}) = \frac{P(\text{Purchased} \cap \text{Planned})}{P(\text{Planned})}.   - Calculation: 200/1000250/1000=200250=0.8\frac{200/1000}{250/1000} = \frac{200}{250} = 0.8.

Independence of Events

  • Definition: Two events are independent if the occurrence of one does not affect the probability of the other.
  • Verification Test: $A$ and $B$ are independent if and only if:   P(AB)=P(A)P(A|B) = P(A)
  • Dependency Test (TV Purchase Example):   - Does purchasing (AA) depend on planning (BB)?   - From previous math: P(PurchasePlan)=0.8P(\text{Purchase}|\text{Plan}) = 0.8.   - From the table margin: P(Purchase)=0.3P(\text{Purchase}) = 0.3.   - Conclusion: Since 0.80.30.8 \neq 0.3, the events are dependent. Planning significantly changes the probability of a purchase occurring.