MD

Probability Notes

Key Concepts

Probability

  • Probability provides a concrete, computable representation for an uncertain world.

  • It involves assigning numbers between 0 and 1 to describe the likelihood of different events.

  • This allows for quantifying uncertainty and making informed decisions based on likelihoods.

Definitions

  • Experiment (or trial): An occurrence with an uncertain outcome. Example: Flipping a coin.

  • Outcome: The result of an experiment. Example: Heads or tails.

  • Sample Space: The set of all possible outcomes for an experiment. Example: {Heads, Tails} for a coin flip.

  • Event: A subset of possible outcomes with some common property. Example: Getting heads in a coin flip.

  • Probability: The number of outcomes in the event divided by the total number of outcomes in the sample space; a real number between 0 and 1. Example: Probability of getting heads = 1/2 = 0.5.

  • Probability distribution: A mapping of outcomes to probabilities that sum to 1. Example: For a fair coin: P(Heads) = 0.5, P(Tails) = 0.5.

  • Random variable: A variable representing an unknown value, with a known probability distribution. Example: X, where X = 1 for heads and X = 0 for tails.

  • Probability density/mass function: A function that defines a probability distribution by mapping each outcome to a probability.

  • Observation: An outcome that has been directly observed. Example: Observing 'Heads' after flipping a coin.

  • Likelihood: How likely an observation is given a probability distribution. Example: Likelihood of observing 'Heads' given a fair coin is 0.5.

  • Sample: An outcome that has been simulated according to a probability distribution. Example: Simulating 100 coin flips.

  • Expectation/expected value: The average value of a random variable.

  • A random variable X has a probability distribution P(X), which assigns probabilities 0 \le P(X = x) \le 1 to outcomes x which belong to a sample space x.

  • An event is a set of outcomes that are a subset of the sample space.

  • That probability distribution is defined by a probability density/mass function fX(x), which assigns probabilities to outcomes such that the sum of probabilities over all outcomes is 1, \sum fX(x) = 1.

  • We can observe specific outcomes x_i drawn from a distribution as a result of trials.

  • We can sample (simulate) new outcomes x'_j given a distribution P(X).

  • Assuming outcomes have values we can evaluate the average expected value E[X] across infinitely many trials.

Philosophy of Probability

  • Bayesian/Laplacian: Probability as a calculus of belief; probabilities are measures of degrees of belief. This approach allows incorporating prior knowledge.

  • Frequentist: Probabilities are the long-term behavior of repeated events. This is based on observed frequencies.

  • Bayesian models include priors and consider probability as a degree of belief.

  • Frequentist models do not use priors and define probability as the long-term frequency of events.

Generative Models

  • Involve a generative process with unobserved variables that govern the observations. These models are used to simulate data.

  # Example of a generative model in Python
  import numpy as np

  def coin_flip_generator(bias, num_flips):
      '''
      Generates a sequence of coin flips.
      bias: Probability of heads (0 <= bias <= 1).
      num_flips: Number of coin flips to simulate.
      '''
      outcomes = np.random.choice(["Heads", "Tails"], size=num_flips, p=[bias, 1-bias])
      return outcomes

  # Simulate 10 coin flips with a bias of 0.7 towards heads
  simulated_flips = coin_flip_generator(0.7, 10)
  print(simulated_flips)
  • Forward probability: Questions related to the distribution of the observations. Example: What is the probability of observing a certain sequence of coin flips?

  • Inverse probability: Questions related to unobserved variables that govern the process that generated the observations. Example: Given a sequence of coin flips, what is the probability that the coin is biased?

Axioms of Probability

  • Boundedness: 0 \le P(A) \le 1 for all possible events A. Probabilities are 0, or positive and less than 1.

  • Unitarity: \sum P(A) = 1 for the complete set of possible outcomes A in a sample space \sigma. Something always happens.

  • Sum rule: P(A \lor B) = P(A) + P(B) - P(A \land B). The probability of either event A or B happening is the sum of the independent probabilities minus the probability of both happening.

  • Conditional probability: P(A|B) = \frac{P(A \land B)}{P(B)}. The probability that event A will happen given that we already know B to have happened.

Random Variables and Distributions

  • A random variable can take on different values and is written with a capital letter, like X.

  • A probability distribution defines how likely different outcomes of a random variable are.

  • P(X = x), the probability of random variable X taking on value x

  • P(X), shorthand for whole probability distribution of X

  • P(x), shorthand for probability of specific value X=x

  • The distribution of a discrete random variable is described with a probability mass function (PMF), fX(x), where P(X = x) = fX(x).

  # Example of a PMF for a discrete random variable
  def pmf_example(x):
      '''
      Example PMF for a discrete random variable.
      x: Value of the random variable.
      '''
      if x == 0:
          return 0.3
      elif x == 1:
          return 0.7
      else:
          return 0.0

  # Probability that X = 1
  probability_x_equals_1 = pmf_example(1)
  print(probability_x_equals_1) # Output: 0.7
  • A continuous variable has a probability density function (PDF), f_X(x), which specifies the spread of the probability over outcomes as a continuous function.

  • A probability mass function or probability density function must sum/integrate to exactly 1.

Expectation

  • If a random variable takes on numerical values, then we can define the expectation or expected value of a random variable E[X].

  • For a discrete random variable with probability mass function P(X = x) = fX(x), we would write this as a summation: E[X] = \sum fX(x)x

Samples and Sampling

  • Samples are observed outcomes of an experiment.

  • For discrete data, the empirical distribution can estimate the probability mass function by counting each outcome seen divided by the total number of trials. This provides an estimate of the underlying probabilities.

Random Sampling Procedures

  • There are algorithms which can generate continuous random numbers which are uniformly distributed in an interval, such as from 0.0 to 1.0.

  • For a discrete probability mass function, we can sample outcomes according to any arbitrary PMF by partitioning the unit interval.

  # Example of random sampling from a PMF
  import random

  def sample_from_pmf(pmf_dict):
      '''
      Samples an outcome from a discrete PMF.
      pmf_dict: Dictionary where keys are outcomes and values are probabilities.
      '''
      outcomes = list(pmf_dict.keys())
      probabilities = list(pmf_dict.values())
      return random.choices(outcomes, weights=probabilities, k=1)[0]

  # Example PMF
  pmf = {"A": 0.2, "B": 0.5, "C": 0.3}

  # Sample an outcome
  sampled_outcome = sample_from_pmf(pmf)
  print(sampled_outcome)

Joint, Conditional, Marginal Probability

  • The joint probability of two random variables P(X, Y ) gives the probability that X and Y take specific values simultaneously.

  • P(X) = \int P(X, Y )dy for a PDF.

  • P(X) = \sum P(X, Y ) for a PMF.

  • The conditional probability distribution of a random variable X given a random variable Y is written as P(X|Y ) = \frac{P(X, Y )}{P(Y )}.

Writing and Manipulation of Probabilities

  • Probabilities can be used to represent belief.

  • The odds against an event with probability p is defined by: odds_{against} = \frac{1 - p}{p}

  • logit(p) = log(\frac{p}{1 - p})

  • prob(g) = \frac{e^g}{1 + e^g}

  • log P(x1,\ldots, xn) = \sum log P(x_i)

Bayes' Rule

  • Helps invert conditional distributions.

  • P(A|B) = \frac{P(B|A)P(A)}{P(B)}

  • P(A|B) is called the posterior: the updated belief after considering evidence B.

  • P(B|A) is called the likelihood: the probability of observing evidence B given A is true.

  • P(A) is called the prior: the initial