SDS CH4 - Probability Fundamentals: Collectively Exhaustive Events, Contingency Tables, and Conditional Probability
Collectively Exhaustive Events and Sample Spaces
- Definition of Collectively Exhaustive Events: A collectively exhaustive list or set of events is one that defines every single possible outcome in the sample space. Nothing can happen that hasn't been defined in this list.
- Example of Non-Exhaustive Events (Die Toss): If one defines the outcomes of a die toss as the set {1,2,3,4,5}, this is not collectively exhaustive because it is missing the number 6. Since a standard die has six sides, an undefined event (rolling a six) is still possible.
- Venn Diagram Case Study 1 (Undefined Probability):
- P(A)=0.4
- P(B)=0.3
- Total defined probability: 0.4+0.3=0.7
- This is not collectively exhaustive because there is a remaining probability of 0.3 in the sample space that is undefined. There is a 30% chance something outside of A or B could occur.
- Venn Diagram Case Study 2 (Overlapping Events and the Addition Rule):
- P(A)=0.4
- P(B)=0.5
- Suppose there is an undefined event floating in the sample space with a probability of 0.2.
- If we simply sum the probabilities (0.4+0.5+0.2), we get 1.1, which exceeds the required total sample space probability of 1.
- This happens because of double counting the intersection (overlap). To find the intersection, we use the General Addition Rule:
P(A∪B)=P(A)+P(B)−P(A∩B)
- If the space outside A and B is 0.2, then the union P(A∪B) must be 1−0.2=0.8.
- Setting up the equation: 0.8=0.4+0.5−P(A∩B).
- 0.8=0.9−P(A∩B), which means P(A∩B)=0.1.
- Visualizing Exhaustive Space: In a partitioned Venn diagram, a completely exhaustive space has no sections left undefined. In real-world scenarios, establishing a completely exhaustive space is difficult unless working with a "closed world" like coin tosses, die rolls, or a standard pack of cards.
Mutually Exclusive vs. Not Mutually Exclusive Events
- Mutually Exclusive Events:
- Definition: Two events that cannot occur at the same time.
- Mathematical definition: The joint probability of the intersection is zero: P(A∩B)=0.
- Visual representation: There is no overlap between circles in a Venn diagram.
- Not Mutually Exclusive Events:
- Definition: Events that can occur simultaneously.
- Mathematical definition: The joint probability is greater than zero: P(A∩B)>0.
- Visual representation: There is a physical overlap on a Venn diagram representing the shared probability.
Probability Approaches
1. A Priori Probability (Theoretical)
- Calculation: P=TX, where X is the number of ways the event occurs and T is the total possible outcomes.
- Basis: Prior knowledge. No data gathering is required.
- Examples:
- Coin Toss: The probability of tails is known to be 0.5 (or roughly 50/50) for a balanced coin without needing to flip it a thousand times. In an infinite number of repetitions, the simulation would align with this theoretical value.
- Calendar Dates: The probability of a day being in January in a non-leap year is 36531. One does not need a survey to find how many days are in January; it is known a priori.
- March 2020 Example: In the year 2020 (a leap year), selecting a day in March would involve 36631.
2. Empirical Probability (Experimental)
- Calculation: P=TX.
- Difference from A Priori: Method of obtaining data. Empirical probability requires gathering data through observation or surveys.
- Example Case Study: Finding the probability of a male taking statistics.
- Data must be gathered from a specific population.
- Sample size (T): 439 people.
- Target event (X): Males taking stats = 84.
- Calculation: P=43984.
3. Subjective Probability
- Definition: A non-exact mixture of personal experience, feelings, and intuition/analysis.
- Usage: Only used when no other data or theoretical basis is available. Generally avoided in formal academic statistics due to unreliability.
- Hypothetical Scenario: Probability of a new ad campaign being successful.
- Media Development Team (Young): Assigns a 60% probability of success due to optimism.
- Chief Media Officer (Experienced/Jaded): Assigns a 40% probability of success.
- Both probabilities are filtered through subjective perceptions and intuition rather than quantifiable data.
Contingency Tables and Marginal Probabilities
- The Setting: Using a table to track two events, for example: Planning to purchase a TV vs. Actually purchasing the TV.
- Raw Data Counts:
- Planned and Purchased: 200
- Planned but Did Not Purchase: 50
- Did Not Plan but Purchased: 100
- Did Not Plan and Did Not Purchase: 650
- Total Sample (n): 1000
- Simple Events (Marginal Probabilities):
- These involve a single event and are found in the margins (sides/bottom) of the table.
- Probability of Planning to Purchase: 250/1000=0.25.
- Probability of Actually Purchasing: 300/1000=0.30.
- Joint Events:
- These involve the intersection of two events and are found in the body of the table.
- Planned and Purchased: 200/1000=0.20.
- Planned and Did Not Purchase: 50/1000=0.05.
- General Rule for Marginal Probability via Joint Probability:
- To find the probability of A, sum all its co-occurrences with various mutually exclusive and collectively exhaustive events (B1,B2,…,Bk).
- Formula: P(A)=P(A∩B1)+P(A∩B2)+⋯+P(A∩Bk).
- This effectively "cancels out" the influence of event B to find the pure probability of A.
The Fundamental Rules of Probability
- Range: All probabilities must be between 0 and 1 (0≤P≤1).
- 0 = Impossible.
- 1 = Certain.
- Summation: The sum of all mutually exclusive and collectively exhaustive events in a sample space must equal 1.
- General Addition Rule: P(A∪B)=P(A)+P(B)−P(A∩B).
- This rule applies to any two events.
- If events are mutually exclusive, P(A∩B)=0, so the rule simplifies to P(A∪B)=P(A)+P(B).
- Example Calculation: Find P(Planned∪Purchased) using raw table data.
- P(Planned)=250/1000
- P(Purchased)=300/1000
- P(Planned∩Purchased)=200/1000
- Calculation: 1000250+1000300−1000200=1000350=0.35.
- Note: The union probability is higher than individual probabilities because it encompasses a larger area of interest (one or the other occurring).
Conditional Probability
- Concept: The probability of an event given that another event has already occurred. This investigates the effect one variable has on another.
- Notation: P(A∣B) (read as "Probability of A given B").
- Formula: P(A∣B)=P(B)P(A∩B).
- The variable being conditioned on (the event that already happened) always becomes the denominator.
- Visual Intuition: Conditioning on B narrows the "world" or sample space to ONLY the area where B occurs. The numerator is then only the part of A that exists within that new, smaller world.
- Example Calculation: Probability of purchasing given that the person planned.
- Formula: P(Purchased∣Planned)=P(Planned)P(Purchased∩Planned).
- Calculation: 250/1000200/1000=250200=0.8.
Independence of Events
- Definition: Two events are independent if the occurrence of one does not affect the probability of the other.
- Verification Test: $A$ and $B$ are independent if and only if:
P(A∣B)=P(A)
- Dependency Test (TV Purchase Example):
- Does purchasing (A) depend on planning (B)?
- From previous math: P(Purchase∣Plan)=0.8.
- From the table margin: P(Purchase)=0.3.
- Conclusion: Since 0.8=0.3, the events are dependent. Planning significantly changes the probability of a purchase occurring.