Week 3: Predicting events
Week 3: Predicting Events
Probability Theory
A probability is a number between 0 and 1 that tells us how likely a certain event is.
Classical method of assigning probabilities:
\frac{\text{Number of outcomes in which an event occurs}}{\text{Number of possible outcomes}}The empirical probability or relative frequency of occurrence method:
\frac{\text{Number of outcomes in which an event has occurred in the past}}{\text{Number of opportunities for an event to occur}}Subjective probability method:
Using judgment and other subjective criteria to determine the probability an event occurs.
Outline
Probability theory
Law of addition
Conditional probability
Application of Events: Probability trees and Binomial probability (more on this next week)
Introduction to Probability Theory
Probability theory is a prerequisite to statistics.
Experiment: A random process that creates outcomes (e.g., the data-collection procedure).
Sample space: The set of all possible outcomes.
Event: A set of outcomes (can contain no outcome, single outcome, or multiple outcomes) of an experiment to which probability is assigned.
Multi-dimensional Data
Example (Contingency Table or Cross-Tab):
Two random variables:
Phone plan chosen by a client ($29, $49, $79)
Day (weekday vs weekend) the client bought the plan
6 potential outcomes: Every combination between phone plan chosen by a client and whether on weekday or weekend purchased.
Cost of phone plan | $29 | $49 | $79 | Total |
|---|---|---|---|---|
Mon-Fri | 10 | 120 | 250 | 380 |
Sat-Sun | 40 | 30 | 350 | 420 |
Total | 50 | 150 | 600 | 800 |
Sample space: {$29 on weekdays, $29 on weekends, $49 on weekdays, $49 on weekends, $79 on weekdays, $79 on weekends}.
Example Events: {$29 on weekdays}, {customers bought $49 plan (at some day)}, {customers bought any plan on a weekday}, {$49 on weekdays, $79 on weekends}
Assigning Probabilities to Observed Outcomes
Relative frequency: outcomes receive probability corresponding to their number of occurrences
P(\text{outcome}i) = \frac{\text{number of occurrences of outcome}i}{\text{total number of occurrences of all outcomes}}
Cost of phone plan | $29 | $49 | $79 | Total |
|---|---|---|---|---|
Mon-Fri | 0.0125 | 0.15 | 0.3125 | 0.475 |
Sat-Sun | 0.05 | 0.0375 | 0.4375 | 0.525 |
Total | 0.0625 | 0.1875 | 0.75 | 1 |
Law of Addition
Joint vs Marginal Probabilities
Joint probability: denotes relative frequency when asking about all dimensions (e.g., what is the relative frequency that a customer bought a $49 plan on a weekday?)
Marginal Probability: displays relative frequency when only asking about a single dimension. E.g., what is the relative frequency that a customer bought a $49 plan? (it’s single dimension, because here we do not care whether they bought it on a weekday or on weekends)
Venn Diagram: Visualization of Probabilities
Venn Diagram (John Venn, 1880) shows logic relations across sets.
The external rectangle indicates the whole sample space.
The internal circle indicates some event A.
Joint Events
The joint probability is represented by the overlapping part in the Venn Diagram.
The symbol \cap means ‘intersection’. A \cap B can be thought of as A & B.
Law of Addition
Joint Probability such as P(\text{weekdays and } $29) and P(\text{weekend and } $49). Joint probability describes the probability of outcomes associated with more than one random variable, e.g., Day and Price.
Marginal Probability such as P(weekdays) and P($29). Marginal probability describes the probability of outcomes associated with only one random variable.
Cost of phone plan | $29 | $49 | $79 | Total |
|---|---|---|---|---|
Mon-Fri | 0.0125 | 0.15 | 0.3125 | 0.475 |
Sat-Sun | 0.05 | 0.0375 | 0.4375 | 0.525 |
Total | 0.0625 | 0.1875 | 0.75 | 1 |
Some Notation: Event, Complement, Intersection of Events
Mathematically, we can define an event by A, and define the complement of the event by A' (pronounced as A prime) which means “not A”.
Example:
Let A denote the event “$29”, and B denote “weekdays”.
A' then indicates “not $29”, which in the previous example is “$49 or $79”.
B' then indicates “Sat-Sun”.
Total | |||
|---|---|---|---|
A | A' | ||
B | P(A \cap B) | P(A' \cap B) | P(B) |
B' | P(A \cap B') | P(A' \cap B') | P(B') |
Total | P(A) | P(A') | 1 |
$29 | Not $29 | Total | |
|---|---|---|---|
Weekdays | 0.0125 | 0.4625 | 0.475 |
Not Weekdays | 0.05 | 0.4750 | 0.525 |
Total | 0.0625 | 0.9375 | 1 |
Complement Rule of Probability
Complement Rule of Probability
P(A)+P(A') = 1
Law of Total Probability, Version 1
Law of total probability, version 1
P(A \cap B) + P(A \cap B') = P(A)
Union of Events
A \cup B indicates the event A or B happens. This is denoted by A \cup B, pronounced as the union of A and B, or A union B. So P(A \cup B) indicates the probability that A or B is true, or that A or B occur.
General Rule of Addition
What is the probability of the event that a plan is sold at a weekday or it is sold for $29 dollars?
Answer: \frac{10}{800} + \frac{40}{800} + \frac{370}{800} =0.0125+0.05+0.4625
Alternative: \frac{50}{800} + \frac{380}{800} - \frac{10}{800} =0.0625+0.475-0.0125
General rule of addition
P(A \cup B) = P(A) + P(B) - P(A \cap B)
It states that the probability that A or B occurs equals the probability that A occurs plus the probability that B occurs minus the probability that both A and B occur at the same time.
Mutually Exclusive Events
If event A occurs only if event B does not occur (cannot occur at the same time), we say A and B are mutually exclusive (events).
Mutually exclusive events P(A \cap B) = 0. In the Venn diagram, these events do not intersect.
Any event and its complement are mutually exclusive. Either “A occurs” or “A does not occur”. So P(A \cap A') = 0 always.
Collectively Exhaustive Events
If the occurrence of events A and B covers the whole sample space, we say A and B are collectively exhaustive (events).
Collectively exhaustive events P(A \cup B) = 1
Any event and its complement are collectively exhaustive. “A occurs” and “A does not occur” make up all possible outcomes. So P(A \cup A') = 1 always.
Conditional Probabilities and Independence
Conditional Probabilities
In many cases, we are interested in conditional probabilities.
What is the probability of achieving growth in the next quarter conditional on the success of our advertisement campaign?
What’s the probability of passing this subject conditional on not attending lectures?
P(A|B) denotes the probability that event A occurs, conditional on that B occurs. The symbol P(X = x | Y = y) denotes the probability of random variable X taking value x, conditional on the random variable Y taking value y.
In our example of costs for phone plan, P($29 | \text{weekdays}) or P(\text{Price} = $29 | \text{Day} = \text{weekdays}) means the probability of a client choosing the $29 plan, conditional on she visits the store during weekdays.
Conditional Probability Example
What is the probability of selling a $29 plan, conditional on a weekday?
Approach 1: \frac{10}{380} = 0.026
Approach 2: \frac{P($29 \cap Weekday)}{P(Weekday)} = \frac{0.0125}{0.475} = 0.026
Conditional Prob.: Formula
Conditional probability can be computed as follows:
P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}
So the prob of A conditional B equals the joint prob of both A and B, divided (weighted) by the prob of B (which is a marginal prob).
Bayes Rule
From the conditional probability formula, we can write the general law of multiplication:
P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)We then rearrange this further to derive Bayes rule:
P(B|A) = \frac{P(A|B)P(B)}{P(A)}
Law of total probability, version 2
P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}
So joint prob equals conditional prob multiplied by marginal prob. This leads to Law of total probability, version 2
P(A|B)P(B) + P(A|B')P(B') = P(A)
Independent Events
Events are said to be independent if the occurrence or non-occurrence of an event does not affect the occurrence or non-occurrence of another event(s).
For example, flipping a coin twice. We know that the probability of heads is 50%.
Now, regardless of what happens on the first toss, the probability of heads on the second toss is always 50%. That is, the outcome in the second toss is independent of the outcome of the first toss.
Mathematically, independence implies: P(Y|X) = P(Y)
Does the customer’s purchasing behavior depend on whether they buy the plan on a weekday or on weekends?
If customer behaves on the weekday the same way she behaves on weekends, we say purchasing behavior is independent of day. In other words, independence implies
P($29 | Weekdays) = P($29 | Not Weekdays) = P($29)
P($29 | Weekdays) = \frac{0.0125}{0.475} = 0.026
P($29 | Not Weekdays) = \frac{0.05}{0.525} = 0.095
P($29) = 0.0625
Since these probabilities are all different, they may not be independent events.
Independent Events: Formula
If A and B are independent (events), whether or not B occurs should not affect the probability that A occurs; also, whether or not A occurs should not affect the probability that B occurs. This means
Independent events, version 1:
P(A|B) = P(A); \quad P(B|A) = P(B)Based on Bayes rule this also means
Independent events, version 2:
P(A \cap B) = P(A) \times P(B)
Implications of Formulas
P(A|B) = P(A); \quad P(B|A) = P(B) \implies P(A \cap B) = P(A) \times P(B) \implies \text{independent}
P(A|B) \neq P(A); \quad P(B|A) \neq P(B) \implies P(A \cap B) \neq P(A) \times P(B) \implies \text{not independent, so dependent}
Example
Are the events “client choosing $29 plan” and “client purchasing during weekdays” independent?
We have P($29) = 0.0625, and P(weekdays) = 0.475,
So P($29) \times P(weekdays) = 0.0297, which is different from
P($29 \cap weekdays) = 0.0125.
So these two events may not be independent.
Open Questions
Q: Why we are not 100% sure and say “may”?
Q: If we collect the data to answer the question “Are choices of plan and day of purchase independent?”, is the data a sample or a population? If we choose other set of data, will the table look the same as the one above?
To see if such a hypothesis holds true, we need to do statistical tests. We will study those tests later in the semester.
Binominal Experiments
Suppose you toss a coin 3 times in a row and you are interested in how likely it is that you get exactly two heads.
This is an instance of a binominal experiment.
A binomial experiment assesses the number of a certain outcome from repeated independent trials.
Each trial has two possible outcomes (e.g., heads or tail, success or failure,…)
Research question is usually: What is the prob of x successes out of n trials? For example: what is the prob of detecting 2 defects out of 3 products? What is the prob of 1 heads out of 3 coin tosses?
Binominal Tree
When two outcomes (e.g., success or failure, or binary outcomes) are independent, P(A|B) = P(A). In such a case, we can draw a probability tree called a binomial tree.
Suppose we have three products, each can be defect (D) with probability p or functional (F) with probability q = 1 − p
Example
What is the prob. of x successes out of n trials? For example: what is the prob of detecting 2 defects out of 3 products (2D,1F)?
Define the random variable X as the number of successes (defects) out of n trials (3 products). We are looking for P(X = 2; n = 3, \text{success defect rate} = p).
Case 1: D,D,F; prob: p \times p \times (1 − p) = p^2(1 − p)
Case 2: D,F,D; prob: p \times (1 − p) \times p = p^2(1 − p)
Case 3: F,D,D; prob= (1 − p) \times p \times p = p^2(1 − p)
Given outcomes across trials are independent, what is the probability of x successes out of n trials with probability of each success being p?
P(X = x; n, p) = \text{number of cases or combinations} \times p^x(1 − p)^{n−x}
Total prob. is the sum of all cases. Thus,
P(2; 3, p) = 3p^2(1 − p)
Binomial Distribution
A random variable X taking value in 0, 1, …, n is said to follow the binomial distribution, denoted by X \sim Bin(n, p) if it describes the (random) number of successes out of n trials in a binomial experiment (meaning that successes in different trials are independent).
We can calculate the probability of X successes from n trials using the general formula:
P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x}
p^x: The probability of x successes.
(1 − p)^{n−x}: The probability of n − x failures. So in total we have n trials.
The factor (combinatorial operator) \binom{n}{x} = \frac{n!}{x!(n − x)!} computes the number of cases or combinations of choosing x objects from the set of n objects. Remember the factorial operator m! = 1 \times 2 \times 3 \times … \times (m − 1) \times m.
Binomial Distribution: Example
Say the defect rate is 0.2, there are 3 trials, and the random variable X is the number of defects out of 3 products (in mathematical notation we can write X \sim Bin(3, 0.2)
P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x} = P(X = 2; 3, 0.2) = \binom{3}{2} (0.2)^2 (1 − 0.2)^{(3−2)} = 0.096
This calculation is straightforward in Excel using the BINOM.DIST function
Binominal Distribution
Properties of binomial distribution:
Almost all distributions have expectation (i.e., mean) and variance (and thus standard deviation). We learn some other important distributions in weeks 4 and 5.
Every distribution is characterized by some parameters.
The binomial distribution has two parameters, n (the number of trials) and p (the success probability or success rate).
The mean (or expectation) and variance of X \sim Bin(n, p) are given by
\mu_X = E(X) = np
\sigmaX^2 = Var(X) = np(1 − p), so \sigmaX = std(X) = \sqrt{np(1 − p)}
Summary: Probability
Summary for week 2:
Probability theory
a) P(A) + P(A') = 1
b) P(A \cap B) + P(A \cap B') = P(A)
c) P(A \cup B) = P(A) + P(B) - P(A \cap B)
d) P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}
e) P(A|B)P(B) + P(A|B')P(B') = P(A); P(B|A)P(A) + P(B|A')P(A') = P(B)
f) independence \iff P(A|B) = P(A); \quad P(B|A) = P(B)
g) independence \iff P(A \cap B) = P(A) \times P(B)
Binomial distribution: prob of x success in n trials, with p success rate for each trial. X \sim Bin(n, p)
P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x}