Probability Theory and Event Prediction Notes
Foundations of Probability Theory
Definition of Probability: A probability is a quantitative value, expressed as either a number or a percentage, that ranges between and . It serves as a measure indicating the likelihood of a specific event occurring.
Backbone of Statistics: Probability theory is considered the fundamental prerequisite and structural basis for the field of statistics.
Methods for Assigning Probabilities: * Classical Method: Probabilities are assigned based on the theoretical ratio of outcomes. The formula is: * * Empirical Probability (Relative Frequency Method): This approach relies on historical data or observed results. The formula is: * * Subjective Probability Method: This method involves the use of individual judgment, experience, and other non-mathematical criteria to determine the likelihood of an event.
Core Definitions: * Experiment: A defined random process that generates specific results or data points (e.g., a data-collection procedure). * Sample Space: The comprehensive set representing all possible outcomes of an experiment. * Event: A specific set of outcomes derived from an experiment. An event can be categorized as containing no outcome (empty set), a single outcome, or multiple outcomes. Probability is specifically assigned to these events.
Data Structure and Contingency Tables
Multi-dimensional Data: Data collections often involve multiple variables or dimensions. This is frequently organized using a contingency table (also known as a cross-tab).
Phone Plan Example Data (Counts): * The data tracks two random variables: Phone plan choice (, , or dollars) and the day of purchase (Monday-Friday vs. Saturday-Sunday). * Monday-Friday (Weekdays): * Plan: instances * Plan: instances * Plan: instances * Total Weekday Purchases: * Saturday-Sunday (Weekends): * Plan: instances * Plan: instances * Plan: instances * Total Weekend Purchases: * Combined Totals: * Total Plans: * Total Plans: * Total Plans: * Grand Total Outcomes:
Potential Outcomes: In this example, there are specific potential outcomes representing every possible combination of plan price and purchase day.
Sample Space for Example: {$29 \, \text{on weekdays}, $29 \, \text{on weekends}, $49 \, \text{on weekdays}, $49 \, \text{on weekends}, $79 \, \text{on weekdays}, $79 \, \text{on weekends}}
Probability Concepts and Visualizations
Relative Frequency Assignment: Outcomes are assigned probabilities based on their observed frequency relative to the total samples. *
Joint vs. Marginal Probabilities: * Joint Probability: This denotes the relative frequency of an event involving all dimensions simultaneously (e.g., the probability a customer bought the plan on a weekday). It describes outcomes associated with more than one random variable. * Marginal Probability: This represents the relative frequency of an event when considering only a single dimension, regardless of other variables (e.g., the total probability of a customer buying a plan, ignoring which day it was purchased).
Venn Diagrams: Introduced by John Venn in , these diagrams visualize logical relations between sets. * External Rectangle: Represents the entire sample space. * Internal Circle: Represents a specific event, such as event .
Mathematical Notation: * Complement (): Pronounced "A prime," this refers to the event "not ." For example, if is the event of buying a dollar plan, is the event of buying any plan except the dollar one. * Intersection (): This symbol represents "and," indicating the co-occurrence of events. refers to both and happening. This is visualized as the overlapping section in a Venn diagram. * Union (): Pronounced "union," this symbol represents "or." indicates that event or event (or both) occurs.
Essential Rules of Probability
Complement Rule: The sum of the probability of an event and its complement is always equal to . *
Law of Total Probability (Version 1): The sum of the joint probabilities of an event intersection with and intersection with not equals the marginal probability of . *
General Rule of Addition: This calculates the probability of the union of two events. * * Example calculation from data: Probability of selling a plan on a weekday OR selling it for dollars. * Calculation 1: * Alternative calculation: *
Mutually Exclusive Events: Events that cannot occur at the same time. If and are mutually exclusive, then . They do not intersect in a Venn diagram. Any event and its complement are inherently mutually exclusive.
Collectively Exhaustive Events: A set of events where the occurrence of at least one covers the entire sample space. If and are collectively exhaustive, then . Any event and its complement are collectively exhaustive.
Conditional Probabilities and Bayes Rule
Conditional Probability (): Denotes the probability that event occurs given that event has already occurred. * Formula: * Formula: * Example: The probability of a client choosing a dollar plan conditional on visiting the store during a weekday. * Approach 1: * Approach 2:
General Law of Multiplication: Derived from the conditional probability formula. *
Bayes Rule: An algebraic rearrangement of the multiplication law. *
Law of Total Probability (Version 2): Using conditional probabilities to find a marginal probability. *
Event Independence
Defining Independence: Events are independent if the occurrence or non-occurrence of one event has no effect on the likelihood of the other event. * Example: Flipping a coin twice. The outcome of the second toss is independent of the first; regardless of the first result.
Mathematical Tests for Independence: * Version 1: and * Version 2:
Testing for Dependence: If or if , the events are considered dependent (not independent).
Independence Test Example: Plan choice vs. Day of purchase. * * * * Observed * Since , the events are not independent.
Case Study: Titanic Sinking Data
Data Set Summary: * Survivors: Females, Males. (Total survivors: \/) * Deceased: Females, Males. (Total deaths: \/) * Gender Totals: Females, Males. (Grand Total: )
Probability Calculations: * Marginal Probability of Surviving: () * Joint Probability (Male and Surviving): () * Union Probability (Survived OR Male): * Conditional Probability (Survived given Male): () * Probability of Not Surviving: . Note that Surviving and Dying are complements, mutually exclusive, and collectively exhaustive.
Testing Independence (Gender and Survival): * Independence would require . * . * . * Since , survival was dependent on gender.
Questions & Discussion
Q: Why use the word "may" when concluding lack of independence based on the calculation? * A: We cannot be sure from raw data alone without performing formal statistical hypothesis tests. These tests determine if the difference observed in the sample is significant enough to represent the population. These tests will be studied later in the semester.
Q: If we collect data to answer if plan choices and day of purchase are independent, is that a sample or a population? * A: It is typically a sample. If you chose a different set of data, the numbers and resulting probabilities in the contingency table would likely vary.
Weekly Summary of Formulae
a)
b)
c)
d) ;
e) ;
f)
g)