Week 3: Predicting events

Week 3: Predicting Events

Probability Theory

A probability is a number between 0 and 1 that tells us how likely a certain event is.
Classical method of assigning probabilities:
\frac{\text{Number of outcomes in which an event occurs}}{\text{Number of possible outcomes}}
The empirical probability or relative frequency of occurrence method:
\frac{\text{Number of outcomes in which an event has occurred in the past}}{\text{Number of opportunities for an event to occur}}
Subjective probability method:
- Using judgment and other subjective criteria to determine the probability an event occurs.

Outline

Probability theory
Law of addition
Conditional probability
Application of Events: Probability trees and Binomial probability (more on this next week)

Introduction to Probability Theory

Probability theory is a prerequisite to statistics.
Experiment: A random process that creates outcomes (e.g., the data-collection procedure).
Sample space: The set of all possible outcomes.
Event: A set of outcomes (can contain no outcome, single outcome, or multiple outcomes) of an experiment to which probability is assigned.

Multi-dimensional Data

Example (Contingency Table or Cross-Tab):
- Two random variables:
  1. Phone plan chosen by a client ($29, $49, $79)
  2. Day (weekday vs weekend) the client bought the plan
- 6 potential outcomes: Every combination between phone plan chosen by a client and whether on weekday or weekend purchased.

Cost of phone plan	$29	$49	$79	Total
Mon-Fri	10	120	250	380
Sat-Sun	40	30	350	420
Total	50	150	600	800

Sample space: {$29 on weekdays, $29 on weekends, $49 on weekdays, $49 on weekends, $79 on weekdays, $79 on weekends}.
Example Events: {$29 on weekdays}, {customers bought $49 plan (at some day)}, {customers bought any plan on a weekday}, {$49 on weekdays, $79 on weekends}

Assigning Probabilities to Observed Outcomes

Relative frequency: outcomes receive probability corresponding to their number of occurrences
P(\text{outcome}i) = \frac{\text{number of occurrences of outcome}i}{\text{total number of occurrences of all outcomes}}

Cost of phone plan	$29	$49	$79	Total
Mon-Fri	0.0125	0.15	0.3125	0.475
Sat-Sun	0.05	0.0375	0.4375	0.525
Total	0.0625	0.1875	0.75	1

Law of Addition

Joint vs Marginal Probabilities

Joint probability: denotes relative frequency when asking about all dimensions (e.g., what is the relative frequency that a customer bought a $49 plan on a weekday?)
Marginal Probability: displays relative frequency when only asking about a single dimension. E.g., what is the relative frequency that a customer bought a $49 plan? (it’s single dimension, because here we do not care whether they bought it on a weekday or on weekends)

Venn Diagram: Visualization of Probabilities

Venn Diagram (John Venn, 1880) shows logic relations across sets.
- The external rectangle indicates the whole sample space.
- The internal circle indicates some event A.

Joint Events

The joint probability is represented by the overlapping part in the Venn Diagram.
The symbol \cap means ‘intersection’. A \cap B can be thought of as A & B.

Law of Addition

Joint Probability such as P(\text{weekdays and } $29) and P(\text{weekend and } $49). Joint probability describes the probability of outcomes associated with more than one random variable, e.g., Day and Price.

Marginal Probability such as P(weekdays) and P($29). Marginal probability describes the probability of outcomes associated with only one random variable.

Cost of phone plan	$29	$49	$79	Total
Mon-Fri	0.0125	0.15	0.3125	0.475
Sat-Sun	0.05	0.0375	0.4375	0.525
Total	0.0625	0.1875	0.75	1

Some Notation: Event, Complement, Intersection of Events

Mathematically, we can define an event by A, and define the complement of the event by A' (pronounced as A prime) which means “not A”.
Example:
- Let A denote the event “$29”, and B denote “weekdays”.
- A' then indicates “not $29”, which in the previous example is “$49 or $79”.
- B' then indicates “Sat-Sun”.

			Total
	A	A'
B	P(A \cap B)	P(A' \cap B)	P(B)
B'	P(A \cap B')	P(A' \cap B')	P(B')
Total	P(A)	P(A')	1

	$29	Not $29	Total
Weekdays	0.0125	0.4625	0.475
Not Weekdays	0.05	0.4750	0.525
Total	0.0625	0.9375	1

Complement Rule of Probability

Complement Rule of Probability
P(A)+P(A') = 1

Law of Total Probability, Version 1

Law of total probability, version 1
P(A \cap B) + P(A \cap B') = P(A)

Union of Events

A \cup B indicates the event A or B happens. This is denoted by A \cup B, pronounced as the union of A and B, or A union B. So P(A \cup B) indicates the probability that A or B is true, or that A or B occur.

General Rule of Addition

What is the probability of the event that a plan is sold at a weekday or it is sold for $29 dollars?
- Answer: \frac{10}{800} + \frac{40}{800} + \frac{370}{800} =0.0125+0.05+0.4625
- Alternative: \frac{50}{800} + \frac{380}{800} - \frac{10}{800} =0.0625+0.475-0.0125

General rule of addition

P(A \cup B) = P(A) + P(B) - P(A \cap B)

It states that the probability that A or B occurs equals the probability that A occurs plus the probability that B occurs minus the probability that both A and B occur at the same time.

Mutually Exclusive Events

If event A occurs only if event B does not occur (cannot occur at the same time), we say A and B are mutually exclusive (events).
- Mutually exclusive events P(A \cap B) = 0. In the Venn diagram, these events do not intersect.
Any event and its complement are mutually exclusive. Either “A occurs” or “A does not occur”. So P(A \cap A') = 0 always.

Collectively Exhaustive Events

If the occurrence of events A and B covers the whole sample space, we say A and B are collectively exhaustive (events).
- Collectively exhaustive events P(A \cup B) = 1
Any event and its complement are collectively exhaustive. “A occurs” and “A does not occur” make up all possible outcomes. So P(A \cup A') = 1 always.

Conditional Probabilities and Independence

Conditional Probabilities

In many cases, we are interested in conditional probabilities.
- What is the probability of achieving growth in the next quarter conditional on the success of our advertisement campaign?
- What’s the probability of passing this subject conditional on not attending lectures?
P(A|B) denotes the probability that event A occurs, conditional on that B occurs. The symbol P(X = x | Y = y) denotes the probability of random variable X taking value x, conditional on the random variable Y taking value y.
In our example of costs for phone plan, P($29 | \text{weekdays}) or P(\text{Price} = $29 | \text{Day} = \text{weekdays}) means the probability of a client choosing the $29 plan, conditional on she visits the store during weekdays.

Conditional Probability Example

What is the probability of selling a $29 plan, conditional on a weekday?
- Approach 1: \frac{10}{380} = 0.026
- Approach 2: \frac{P($29 \cap Weekday)}{P(Weekday)} = \frac{0.0125}{0.475} = 0.026

Conditional Prob.: Formula

Conditional probability can be computed as follows:

P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}

So the prob of A conditional B equals the joint prob of both A and B, divided (weighted) by the prob of B (which is a marginal prob).

Bayes Rule

From the conditional probability formula, we can write the general law of multiplication:
P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)
We then rearrange this further to derive Bayes rule:
P(B|A) = \frac{P(A|B)P(B)}{P(A)}

Law of total probability, version 2

P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}

So joint prob equals conditional prob multiplied by marginal prob. This leads to Law of total probability, version 2

P(A|B)P(B) + P(A|B')P(B') = P(A)

Independent Events

Events are said to be independent if the occurrence or non-occurrence of an event does not affect the occurrence or non-occurrence of another event(s).
- For example, flipping a coin twice. We know that the probability of heads is 50%.
- Now, regardless of what happens on the first toss, the probability of heads on the second toss is always 50%. That is, the outcome in the second toss is independent of the outcome of the first toss.
Mathematically, independence implies: P(Y|X) = P(Y)
Does the customer’s purchasing behavior depend on whether they buy the plan on a weekday or on weekends?
- If customer behaves on the weekday the same way she behaves on weekends, we say purchasing behavior is independent of day. In other words, independence implies

P($29 | Weekdays) = P($29 | Not Weekdays) = P($29)

P($29 | Weekdays) = \frac{0.0125}{0.475} = 0.026

P($29 | Not Weekdays) = \frac{0.05}{0.525} = 0.095

P($29) = 0.0625

Since these probabilities are all different, they may not be independent events.

Independent Events: Formula

If A and B are independent (events), whether or not B occurs should not affect the probability that A occurs; also, whether or not A occurs should not affect the probability that B occurs. This means
- Independent events, version 1:
  P(A|B) = P(A); \quad P(B|A) = P(B)
- Based on Bayes rule this also means
  - Independent events, version 2:
    P(A \cap B) = P(A) \times P(B)

Implications of Formulas

P(A|B) = P(A); \quad P(B|A) = P(B) \implies P(A \cap B) = P(A) \times P(B) \implies \text{independent}

P(A|B) \neq P(A); \quad P(B|A) \neq P(B) \implies P(A \cap B) \neq P(A) \times P(B) \implies \text{not independent, so dependent}

Example

Are the events “client choosing $29 plan” and “client purchasing during weekdays” independent?

We have P($29) = 0.0625, and P(weekdays) = 0.475,

So P($29) \times P(weekdays) = 0.0297, which is different from

P($29 \cap weekdays) = 0.0125.

So these two events may not be independent.

Open Questions

Q: Why we are not 100% sure and say “may”?
Q: If we collect the data to answer the question “Are choices of plan and day of purchase independent?”, is the data a sample or a population? If we choose other set of data, will the table look the same as the one above?
To see if such a hypothesis holds true, we need to do statistical tests. We will study those tests later in the semester.

Binominal Experiments

Suppose you toss a coin 3 times in a row and you are interested in how likely it is that you get exactly two heads.
This is an instance of a binominal experiment.
A binomial experiment assesses the number of a certain outcome from repeated independent trials.
Each trial has two possible outcomes (e.g., heads or tail, success or failure,…)
Research question is usually: What is the prob of x successes out of n trials? For example: what is the prob of detecting 2 defects out of 3 products? What is the prob of 1 heads out of 3 coin tosses?

Binominal Tree

When two outcomes (e.g., success or failure, or binary outcomes) are independent, P(A|B) = P(A). In such a case, we can draw a probability tree called a binomial tree.
Suppose we have three products, each can be defect (D) with probability p or functional (F) with probability q = 1 − p

Example

What is the prob. of x successes out of n trials? For example: what is the prob of detecting 2 defects out of 3 products (2D,1F)?
- Define the random variable X as the number of successes (defects) out of n trials (3 products). We are looking for P(X = 2; n = 3, \text{success defect rate} = p).
  - Case 1: D,D,F; prob: p \times p \times (1 − p) = p^2(1 − p)
  - Case 2: D,F,D; prob: p \times (1 − p) \times p = p^2(1 − p)
  - Case 3: F,D,D; prob= (1 − p) \times p \times p = p^2(1 − p)
Given outcomes across trials are independent, what is the probability of x successes out of n trials with probability of each success being p?

P(X = x; n, p) = \text{number of cases or combinations} \times p^x(1 − p)^{n−x}

Total prob. is the sum of all cases. Thus,

P(2; 3, p) = 3p^2(1 − p)

Binomial Distribution

A random variable X taking value in 0, 1, …, n is said to follow the binomial distribution, denoted by X \sim Bin(n, p) if it describes the (random) number of successes out of n trials in a binomial experiment (meaning that successes in different trials are independent).
We can calculate the probability of X successes from n trials using the general formula:

P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x}

p^x: The probability of x successes.
(1 − p)^{n−x}: The probability of n − x failures. So in total we have n trials.
The factor (combinatorial operator) \binom{n}{x} = \frac{n!}{x!(n − x)!} computes the number of cases or combinations of choosing x objects from the set of n objects. Remember the factorial operator m! = 1 \times 2 \times 3 \times … \times (m − 1) \times m.

Binomial Distribution: Example

Say the defect rate is 0.2, there are 3 trials, and the random variable X is the number of defects out of 3 products (in mathematical notation we can write X \sim Bin(3, 0.2)

P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x} = P(X = 2; 3, 0.2) = \binom{3}{2} (0.2)^2 (1 − 0.2)^{(3−2)} = 0.096

This calculation is straightforward in Excel using the BINOM.DIST function

Binominal Distribution

Properties of binomial distribution:
- Almost all distributions have expectation (i.e., mean) and variance (and thus standard deviation). We learn some other important distributions in weeks 4 and 5.
- Every distribution is characterized by some parameters.
  1. The binomial distribution has two parameters, n (the number of trials) and p (the success probability or success rate).
  2. The mean (or expectation) and variance of X \sim Bin(n, p) are given by

\mu_X = E(X) = np

\sigmaX^2 = Var(X) = np(1 − p), so \sigmaX = std(X) = \sqrt{np(1 − p)}

Summary: Probability

Summary for week 2:
1. Probability theory
  - a) P(A) + P(A') = 1
  - b) P(A \cap B) + P(A \cap B') = P(A)
  - c) P(A \cup B) = P(A) + P(B) - P(A \cap B)
  - d) P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}
  - e) P(A|B)P(B) + P(A|B')P(B') = P(A); P(B|A)P(A) + P(B|A')P(A') = P(B)
  - f) independence \iff P(A|B) = P(A); \quad P(B|A) = P(B)
  - g) independence \iff P(A \cap B) = P(A) \times P(B)
2. Binomial distribution: prob of x success in n trials, with p success rate for each trial. X \sim Bin(n, p)

P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x}