MD

Probability II - Summary

  • Monte Carlo Approaches:

    • These methods approximate the expected value of a random variable by taking multiple random samples and performing statistical analysis. This is particularly useful when dealing with complex systems where direct analytical solutions or exact calculations are not feasible.

    • They are especially valuable when integrals are intractable, meaning they cannot be solved using standard analytical methods.

    • A classic example involves simulating dart throws onto a dartboard to estimate the expected score or even to approximate the value of ππ. Each throw is a random sample, and aggregating the results gives an increasingly accurate estimate.

  • Continuous Random Variables:

    • Continuous random variables are defined using a Probability Density Function (PDF) rather than a Probability Mass Function (PMF), which is used for discrete variables. The PDF describes the likelihood of the variable falling within a particular range of values.

    • The probability of a continuous random variable taking any specific, single value is essentially zero. However, any value within the variable's support (the range of possible values) is possible.

    • Estimating the true PDF from simple observation counts is not straightforward, necessitating the use of smoothing techniques or kernel density estimation to approximate the distribution.

  • Probability Distribution Functions:

    • A Probability Density Function (PDF) maps each value to a density at that point, indicating the relative likelihood of that value occurring. The area under the PDF curve represents probability.

    • To find the probability of a variable falling within a specific interval, you integrate the density function over that region. This gives the cumulative probability for that interval.

    • The probability calculation is expressed as: P(X∈(a,b))=P(a<X<b)=∫abfX(x)dxP(X∈(a,b))=P(a<X<b)=abfX(x)dx, where fX(x)fX​(x) is the PDF of the random variable XX.

  • Normal Distribution:

    • The PDF of the normal distribution is given by: fX(x)=1Ze−(x−μ)22σ2fX​(x)=Z1​e−2σ2(xμ)2​, where Z=2πσ2Z=2πσ2​. Here, μμ is the mean, σ2σ2 is the variance, and ZZ is a normalization constant.

    • A shorthand notation to represent a normally distributed random variable is: X∼N(μ,σ2)XN(μ,σ2).

  • Central Limit Theorem:

    • The Central Limit Theorem (CLT) states that the sum (or average) of a large number of independent, identically distributed random variables will approximately follow a normal distribution, regardless of the original distribution's shape.

    • Mathematically, if Y=X1+X2+X3+…+XnY=X1+X2+X3+…+Xn, then YY approaches a normal distribution: Y∼N(μ,σ)YN(μ,σ), where μμ and σσ depend on the means and variances of the XiXi variables.

  • Multivariate Distributions:

    • Multivariate distributions generalize the concept of PDFs to vector spaces, dealing with multiple random variables at once. They describe the probability of a set of variables taking on specific values simultaneously.

    • A multivariate uniform distribution, for example, assigns equal density to all points within a defined multi-dimensional box or region.

  • Multivariate Normal Distribution:

    • The multivariate normal distribution is characterized by a mean vector μμ (which specifies the expected value of each variable) and a covariance matrix ΣΣ (which describes the relationships between the variables).

    • Notation: X∼N(μ,Σ)XN(μ,Σ).

  • Joint and Marginal Distributions:

    • The joint PDF describes the density over all dimensions of a multivariate random variable, giving the probability of all variables taking on specific values simultaneously.

    • The marginal PDF describes the density over a subset of dimensions, effectively integrating out the other variables. It tells you the probability distribution of one or more variables without regard to the others.

  • Inference:

    • Inference is the process of estimating properties of an unobserved population based on information from a limited sample. It involves making educated guesses about the characteristics of the entire group based on a smaller subset.

    • Population: The entire set of outcomes or individuals of interest, which is often unknown.

    • Parameter: A descriptive measure of a population, such as the mean or variance.

    • Sample: An observed subset of the population, used to make inferences about the population.

    • Statistic: A function of the sample data, such as the sample mean, used to estimate population parameters.

  • Bayesian vs. Frequentist Inference:

    • Bayesian: In Bayesian inference, parameters are treated as random variables with associated probability distributions, while the observed data is considered fixed. This approach allows for incorporating prior beliefs about the parameters.

    • Frequentist: In frequentist inference, parameters are considered fixed but unknown constants, and the data is viewed as random. Conclusions are based on the frequency of observing certain data outcomes under repeated sampling.

  • Approaches to Inference:

    • Direct estimation: Involves using direct formulas or functions to estimate parameters from sample data.

    • Maximum likelihood estimation (MLE): Optimizes parameters by finding the values that maximize the likelihood of observing the given data.

    • Bayesian: Employs probability distributions to encode beliefs about parameters, updating these beliefs based on observed data.

  • Direct Estimation:

    • Arithmetic mean: The most common measure of central tendency, calculated as: μ^=1N∑i=1Nxiμ^​=N1​i=1Nxi

    • Sample variance: A measure of the spread of data around the mean, calculated as: σ^2=1N∑i=1N(xi−μ^)2σ^2=N1​i=1N(xiμ^​)2

  • Maximum Likelihood Estimation (MLE):

    • MLE aims to find the parameter values that maximize the likelihood function, thereby providing the best fit to the observed data. The likelihood function represents the probability of observing the data given the parameters.

    • The likelihood function is defined as: L(θ∣x)=∏ifX(xi;θ), where θθ represents the parameters and fX(xi;θ)fX(xi​;θ) is the probability density function of the data.

  • Bayesian Inference:

    • In Bayesian inference, parameters are treated as random variables characterized by probability distributions. Prior beliefs about the parameters are combined with observed data to obtain a posterior distribution.

    • Bayes' Rule is the foundation of Bayesian inference: P(θ∣D)=P(D∣θ)P(θ)P(D)P(θD)=P(D)P(Dθ)P(θ)​, where P(θ∣D)P(θD) is the posterior probability, P(D∣θ)P(Dθ) is the likelihood, P(θ)P(θ) is the prior probability, and P(D)P(D) is the evidence.

  • Markov Chain Monte Carlo (MCMC):

    • MCMC methods are used to sample from the posterior distribution when direct sampling is not possible. These methods construct a Markov chain that has the desired posterior distribution as its equilibrium distribution.

    • Implementing MCMC requires evaluating both the prior and likelihood for specific values of θθ, which can be computationally intensive.

  • Metropolis-Hastings Algorithm:

    • Uses proposal distribution to explore parameter space.

    • Accepts or rejects jumps based on probability change.

    • Ensures convergence to the target distribution over time, provided the Markov chain is irreducible and aperiodic. The algorithm is commonly used in Bayesian inference and can handle complex posterior distributions that are difficult to sample from directly.

    • The Metropolis-Hastings algorithm is a specific approach within MCMC that allows sampling from complex probability distributions by generating samples based on a proposal distribution and accepting or rejecting samples according to a defined acceptance criterion. This algorithm enhances efficiency by using a random walk process, where candidates for the next sample are generated from a proposal distribution that simplifies the problem while ensuring the correct target distribution is approached over iterations.

j