Week 3: Predicting events

Week 3: Predicting Events

Probability Theory

  • A probability is a number between 0 and 1 that tells us how likely a certain event is.

  • Classical method of assigning probabilities:
    \frac{\text{Number of outcomes in which an event occurs}}{\text{Number of possible outcomes}}

  • The empirical probability or relative frequency of occurrence method:
    \frac{\text{Number of outcomes in which an event has occurred in the past}}{\text{Number of opportunities for an event to occur}}

  • Subjective probability method:

    • Using judgment and other subjective criteria to determine the probability an event occurs.

Outline

  • Probability theory

  • Law of addition

  • Conditional probability

  • Application of Events: Probability trees and Binomial probability (more on this next week)

Introduction to Probability Theory

  • Probability theory is a prerequisite to statistics.

  • Experiment: A random process that creates outcomes (e.g., the data-collection procedure).

  • Sample space: The set of all possible outcomes.

  • Event: A set of outcomes (can contain no outcome, single outcome, or multiple outcomes) of an experiment to which probability is assigned.

Multi-dimensional Data

  • Example (Contingency Table or Cross-Tab):

    • Two random variables:

      1. Phone plan chosen by a client ($29, $49, $79)

      2. Day (weekday vs weekend) the client bought the plan

    • 6 potential outcomes: Every combination between phone plan chosen by a client and whether on weekday or weekend purchased.

Cost of phone plan

$29

$49

$79

Total

Mon-Fri

10

120

250

380

Sat-Sun

40

30

350

420

Total

50

150

600

800

  • Sample space: {$29 on weekdays, $29 on weekends, $49 on weekdays, $49 on weekends, $79 on weekdays, $79 on weekends}.
    Example Events: {$29 on weekdays}, {customers bought $49 plan (at some day)}, {customers bought any plan on a weekday}, {$49 on weekdays, $79 on weekends}

Assigning Probabilities to Observed Outcomes

  • Relative frequency: outcomes receive probability corresponding to their number of occurrences
    P(\text{outcome}i) = \frac{\text{number of occurrences of outcome}i}{\text{total number of occurrences of all outcomes}}

Cost of phone plan

$29

$49

$79

Total

Mon-Fri

0.0125

0.15

0.3125

0.475

Sat-Sun

0.05

0.0375

0.4375

0.525

Total

0.0625

0.1875

0.75

1

Law of Addition

Joint vs Marginal Probabilities
  • Joint probability: denotes relative frequency when asking about all dimensions (e.g., what is the relative frequency that a customer bought a $49 plan on a weekday?)

  • Marginal Probability: displays relative frequency when only asking about a single dimension. E.g., what is the relative frequency that a customer bought a $49 plan? (it’s single dimension, because here we do not care whether they bought it on a weekday or on weekends)

Venn Diagram: Visualization of Probabilities
  • Venn Diagram (John Venn, 1880) shows logic relations across sets.

    • The external rectangle indicates the whole sample space.

    • The internal circle indicates some event A.

Joint Events
  • The joint probability is represented by the overlapping part in the Venn Diagram.

  • The symbol \cap means ‘intersection’. A \cap B can be thought of as A & B.

Law of Addition

Joint Probability such as P(\text{weekdays and } $29) and P(\text{weekend and } $49). Joint probability describes the probability of outcomes associated with more than one random variable, e.g., Day and Price.

Marginal Probability such as P(weekdays) and P($29). Marginal probability describes the probability of outcomes associated with only one random variable.

Cost of phone plan

$29

$49

$79

Total

Mon-Fri

0.0125

0.15

0.3125

0.475

Sat-Sun

0.05

0.0375

0.4375

0.525

Total

0.0625

0.1875

0.75

1

Some Notation: Event, Complement, Intersection of Events
  • Mathematically, we can define an event by A, and define the complement of the event by A' (pronounced as A prime) which means “not A”.

  • Example:

    • Let A denote the event “$29”, and B denote “weekdays”.

    • A' then indicates “not $29”, which in the previous example is “$49 or $79”.

    • B' then indicates “Sat-Sun”.

Total

A

A'

B

P(A \cap B)

P(A' \cap B)

P(B)

B'

P(A \cap B')

P(A' \cap B')

P(B')

Total

P(A)

P(A')

1

$29

Not $29

Total

Weekdays

0.0125

0.4625

0.475

Not Weekdays

0.05

0.4750

0.525

Total

0.0625

0.9375

1

Complement Rule of Probability
  • Complement Rule of Probability
    P(A)+P(A') = 1

Law of Total Probability, Version 1
  • Law of total probability, version 1
    P(A \cap B) + P(A \cap B') = P(A)

Union of Events
  • A \cup B indicates the event A or B happens. This is denoted by A \cup B, pronounced as the union of A and B, or A union B. So P(A \cup B) indicates the probability that A or B is true, or that A or B occur.

General Rule of Addition
  • What is the probability of the event that a plan is sold at a weekday or it is sold for $29 dollars?

    • Answer: \frac{10}{800} + \frac{40}{800} + \frac{370}{800} =0.0125+0.05+0.4625

    • Alternative: \frac{50}{800} + \frac{380}{800} - \frac{10}{800} =0.0625+0.475-0.0125

General rule of addition

P(A \cup B) = P(A) + P(B) - P(A \cap B)

It states that the probability that A or B occurs equals the probability that A occurs plus the probability that B occurs minus the probability that both A and B occur at the same time.

Mutually Exclusive Events
  • If event A occurs only if event B does not occur (cannot occur at the same time), we say A and B are mutually exclusive (events).

    • Mutually exclusive events P(A \cap B) = 0. In the Venn diagram, these events do not intersect.

  • Any event and its complement are mutually exclusive. Either “A occurs” or “A does not occur”. So P(A \cap A') = 0 always.

Collectively Exhaustive Events
  • If the occurrence of events A and B covers the whole sample space, we say A and B are collectively exhaustive (events).

    • Collectively exhaustive events P(A \cup B) = 1

  • Any event and its complement are collectively exhaustive. “A occurs” and “A does not occur” make up all possible outcomes. So P(A \cup A') = 1 always.

Conditional Probabilities and Independence

Conditional Probabilities
  • In many cases, we are interested in conditional probabilities.

    • What is the probability of achieving growth in the next quarter conditional on the success of our advertisement campaign?

    • What’s the probability of passing this subject conditional on not attending lectures?

  • P(A|B) denotes the probability that event A occurs, conditional on that B occurs. The symbol P(X = x | Y = y) denotes the probability of random variable X taking value x, conditional on the random variable Y taking value y.

  • In our example of costs for phone plan, P($29 | \text{weekdays}) or P(\text{Price} = $29 | \text{Day} = \text{weekdays}) means the probability of a client choosing the $29 plan, conditional on she visits the store during weekdays.

Conditional Probability Example
  • What is the probability of selling a $29 plan, conditional on a weekday?

    • Approach 1: \frac{10}{380} = 0.026

    • Approach 2: \frac{P($29 \cap Weekday)}{P(Weekday)} = \frac{0.0125}{0.475} = 0.026

Conditional Prob.: Formula
  • Conditional probability can be computed as follows:

P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}

So the prob of A conditional B equals the joint prob of both A and B, divided (weighted) by the prob of B (which is a marginal prob).

Bayes Rule
  • From the conditional probability formula, we can write the general law of multiplication:
    P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)

  • We then rearrange this further to derive Bayes rule:
    P(B|A) = \frac{P(A|B)P(B)}{P(A)}

Law of total probability, version 2

P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}

So joint prob equals conditional prob multiplied by marginal prob. This leads to Law of total probability, version 2

P(A|B)P(B) + P(A|B')P(B') = P(A)

Independent Events
  • Events are said to be independent if the occurrence or non-occurrence of an event does not affect the occurrence or non-occurrence of another event(s).

    • For example, flipping a coin twice. We know that the probability of heads is 50%.

    • Now, regardless of what happens on the first toss, the probability of heads on the second toss is always 50%. That is, the outcome in the second toss is independent of the outcome of the first toss.

  • Mathematically, independence implies: P(Y|X) = P(Y)

  • Does the customer’s purchasing behavior depend on whether they buy the plan on a weekday or on weekends?

    • If customer behaves on the weekday the same way she behaves on weekends, we say purchasing behavior is independent of day. In other words, independence implies

P($29 | Weekdays) = P($29 | Not Weekdays) = P($29)

P($29 | Weekdays) = \frac{0.0125}{0.475} = 0.026

P($29 | Not Weekdays) = \frac{0.05}{0.525} = 0.095

P($29) = 0.0625

Since these probabilities are all different, they may not be independent events.

Independent Events: Formula

  • If A and B are independent (events), whether or not B occurs should not affect the probability that A occurs; also, whether or not A occurs should not affect the probability that B occurs. This means

    • Independent events, version 1:
      P(A|B) = P(A); \quad P(B|A) = P(B)

    • Based on Bayes rule this also means

      • Independent events, version 2:
        P(A \cap B) = P(A) \times P(B)

Implications of Formulas

P(A|B) = P(A); \quad P(B|A) = P(B) \implies P(A \cap B) = P(A) \times P(B) \implies \text{independent}

P(A|B) \neq P(A); \quad P(B|A) \neq P(B) \implies P(A \cap B) \neq P(A) \times P(B) \implies \text{not independent, so dependent}

Example

Are the events “client choosing $29 plan” and “client purchasing during weekdays” independent?

We have P($29) = 0.0625, and P(weekdays) = 0.475,

So P($29) \times P(weekdays) = 0.0297, which is different from

P($29 \cap weekdays) = 0.0125.

So these two events may not be independent.

Open Questions

  • Q: Why we are not 100% sure and say “may”?

  • Q: If we collect the data to answer the question “Are choices of plan and day of purchase independent?”, is the data a sample or a population? If we choose other set of data, will the table look the same as the one above?

  • To see if such a hypothesis holds true, we need to do statistical tests. We will study those tests later in the semester.

Binominal Experiments

  • Suppose you toss a coin 3 times in a row and you are interested in how likely it is that you get exactly two heads.

  • This is an instance of a binominal experiment.

  • A binomial experiment assesses the number of a certain outcome from repeated independent trials.

  • Each trial has two possible outcomes (e.g., heads or tail, success or failure,…)

  • Research question is usually: What is the prob of x successes out of n trials? For example: what is the prob of detecting 2 defects out of 3 products? What is the prob of 1 heads out of 3 coin tosses?

Binominal Tree

  • When two outcomes (e.g., success or failure, or binary outcomes) are independent, P(A|B) = P(A). In such a case, we can draw a probability tree called a binomial tree.

  • Suppose we have three products, each can be defect (D) with probability p or functional (F) with probability q = 1 − p

Example

  • What is the prob. of x successes out of n trials? For example: what is the prob of detecting 2 defects out of 3 products (2D,1F)?

    • Define the random variable X as the number of successes (defects) out of n trials (3 products). We are looking for P(X = 2; n = 3, \text{success defect rate} = p).

      • Case 1: D,D,F; prob: p \times p \times (1 − p) = p^2(1 − p)

      • Case 2: D,F,D; prob: p \times (1 − p) \times p = p^2(1 − p)

      • Case 3: F,D,D; prob= (1 − p) \times p \times p = p^2(1 − p)

  • Given outcomes across trials are independent, what is the probability of x successes out of n trials with probability of each success being p?

P(X = x; n, p) = \text{number of cases or combinations} \times p^x(1 − p)^{n−x}

Total prob. is the sum of all cases. Thus,

P(2; 3, p) = 3p^2(1 − p)

Binomial Distribution

  • A random variable X taking value in 0, 1, …, n is said to follow the binomial distribution, denoted by X \sim Bin(n, p) if it describes the (random) number of successes out of n trials in a binomial experiment (meaning that successes in different trials are independent).

  • We can calculate the probability of X successes from n trials using the general formula:

P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x}

  • p^x: The probability of x successes.

  • (1 − p)^{n−x}: The probability of n − x failures. So in total we have n trials.

  • The factor (combinatorial operator) \binom{n}{x} = \frac{n!}{x!(n − x)!} computes the number of cases or combinations of choosing x objects from the set of n objects. Remember the factorial operator m! = 1 \times 2 \times 3 \times … \times (m − 1) \times m.

Binomial Distribution: Example

  • Say the defect rate is 0.2, there are 3 trials, and the random variable X is the number of defects out of 3 products (in mathematical notation we can write X \sim Bin(3, 0.2)

P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x} = P(X = 2; 3, 0.2) = \binom{3}{2} (0.2)^2 (1 − 0.2)^{(3−2)} = 0.096

This calculation is straightforward in Excel using the BINOM.DIST function

Binominal Distribution

  • Properties of binomial distribution:

    • Almost all distributions have expectation (i.e., mean) and variance (and thus standard deviation). We learn some other important distributions in weeks 4 and 5.

    • Every distribution is characterized by some parameters.

      1. The binomial distribution has two parameters, n (the number of trials) and p (the success probability or success rate).

      2. The mean (or expectation) and variance of X \sim Bin(n, p) are given by

\mu_X = E(X) = np

\sigmaX^2 = Var(X) = np(1 − p), so \sigmaX = std(X) = \sqrt{np(1 − p)}

Summary: Probability

  • Summary for week 2:

    1. Probability theory

      • a) P(A) + P(A') = 1

      • b) P(A \cap B) + P(A \cap B') = P(A)

      • c) P(A \cup B) = P(A) + P(B) - P(A \cap B)

      • d) P(A|B) = \frac{P(A \cap B)}{P(B)}; \quad P(B|A) = \frac{P(A \cap B)}{P(A)}

      • e) P(A|B)P(B) + P(A|B')P(B') = P(A); P(B|A)P(A) + P(B|A')P(A') = P(B)

      • f) independence \iff P(A|B) = P(A); \quad P(B|A) = P(B)

      • g) independence \iff P(A \cap B) = P(A) \times P(B)

    2. Binomial distribution: prob of x success in n trials, with p success rate for each trial. X \sim Bin(n, p)

P(X = x; n, p) = \binom{n}{x} p^x (1 − p)^{n−x}