Bayesian Inference with Sampling — Data, Inference, and Decisions

Introduction to Bayesian Inference with Sampling

Contents

  • Approximating a Known Distribution with Samples

  • Rejection Sampling

  • Markov Chain Monte Carlo (MCMC)

  • Implementing Models in PyMC

Approximating a Known Distribution with Samples

  • Empirical distributions can be computed from sample data, which allow researchers to make inferences about the underlying population without having to know the actual distribution.

  • This process of approximation will be demonstrated using the Beta distribution (Beta(α, β)), particularly useful in Bayesian statistics due to its flexibility and ability to model events with predictable probabilities, where α and β are shape parameters.

  • General structure of Python code for approximation utilizes powerful libraries such as NumPy, pandas, and SciPy to facilitate data manipulation and statistical calculations.

Sample Representation of Distributions

  • Empirical Representation

    • Sufficient samples provide a good representation of the underlying distribution, leading to a reliable estimation of population parameters.

    • The mean of samples serves as an effective estimator of the population mean, which can be expressed as ( \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i ), where ( X_i ) are the sample values and ( n ) is the sample size.

    • Example using Beta(3, 4) distribution:

      • The exact Probability Density Function (PDF) is computed as ( f(x) = \frac{x^{3-1} (1-x)^{4-1}}{B(3, 4)} ), where ( B(3, 4) ) is the beta function. This allows for effective comparisons and validations against empirical results.

      • Visualization techniques like histograms (e.g., using matplotlib) provide intuitive insights into the distribution of samples versus theoretical predictions.

Variance Estimation

  • Variance of Samples

    • The variance of the samples is estimated using ( s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 ), providing insight into the variability and spread of the data.

    • This section discusses methods to compute both the true mean (population mean) and the approximate mean derived from sample data, emphasizing the role of sample size in statistical reliability.

    • Transitioning to more complex distributions that may lack analytical solutions necessitates numerical or simulation-based methods for practical inference.

Rejection Sampling Introduction

  • Sampling from the Unit Circle

    • This section focuses on principles of rejection sampling, particularly in the context of sampling random variables from the unit circle. This technique is critical for Bayesian inference when direct sampling is infeasible.

    • Detailed steps for sampling uniformly from the unit square are described, grounded in geometric probability, where the area of the unit circle is ( A = \pi r^2 = \pi(1^2) = \pi ).

Target Distribution for Rejection Sampling

  • Complicated Density Examples

    • Insight into sampling from a complex density function characterized as ( p(\theta | x) \propto \theta \cdot (1.5 - \theta) \cdot \sin(\theta), ; \theta \in [0, 1.5] ) is provided.

    • This sets the groundwork for implementing rejection sampling when dealing with distributions that do not conform to standard forms.

Understanding Rejection Sampling

  • Steps of Rejection Sampling

    • Generate samples from a uniform distribution to form a proposal distribution that will be compared to the target (e.g., uniformly sample from ( U[0, 1] )).

    • Graphical illustrations facilitate understanding of the relationship between target and proposal distributions.

    • The crucial step of rejecting samples based on the target density function is explored, emphasizing the importance of maintaining the integrity of the sampling method. The acceptance criterion can be expressed as ( u < \frac{p(\theta_i)}{M q(\theta_i)} ), where ( M ) is the maximum ratio of the target density to the proposal density.

Two-Dimensional Sampling Problem

  • Visualizing Rejection Sampling

    • This part addresses the transition from one-dimensional to two-dimensional sampling challenges.

    • An explanation of how to sample heights corresponding to the target threshold further elucidates this two-dimensional framework, supported by equations like ( z = M * q(\theta_i) ), where z represents the height of samples.

Implementing Rejection Sampling Algorithm

  • Steps Recap

    • A structured recap of the algorithm is presented:

      1. Generate samples from a proposal distribution.

      2. Calculate acceptance probabilities based on the ratio for informed decision-making: ( r_{accept} = p(\theta) / (M q(\theta)) ).

      3. Implement random acceptance and rejection of samples based on computed probabilities.

      4. Discard rejected samples, yielding the final output and enhancing efficiency.

    • Code snippets will illustrate the implementation of these steps, further connecting theory with practical applications.

Efficiency of Rejection Sampling

  • Results of Sampling

    • An analysis of acceptance rates reveals effective insights into rejection sampling methods.

    • Visual comparisons (e.g., histograms) between sample distributions and target density functions elucidate performance.

    • Discussion on implications of sample efficiency highlights the balance between computational cost and statistical validity.

Challenges in High Dimensions

  • Acceptance Probability

    • Identifying challenges related to acceptance probability when scaling proposal distribution is crucial, especially as dimensions increase.

    • Inefficiencies in high-dimensional sampling problems are magnified, necessitating careful consideration of the proposal distributions employed.

    • A new target distribution example requiring a different proposal distribution is highlighted to address critical issues.

Another Example of Rejection Sampling

  • Decaying Target Distribution

    • This section explores unnormalized target distributions and their relationship to exponential sampling distributions.

    • Code examples for implementing this approach are provided, facilitating comprehension of nuanced methodologies within rejection sampling.

Summary of Rejection Sampling

  • Key Takeaways

    • Rejection sampling has emerged as a pivotal technique for sampling from unnormalized target distributions.

    • The importance of selecting a good proposal distribution is highlighted, significantly impacting sampling efficiency.

    • Recognizing efficiency challenges, especially within high-dimensional contexts, helps in refining the application of Bayesian methods.

Introduction to Markov Chain Monte Carlo (MCMC)

  • Transition from Rejection Sampling

    • This section discusses MCMC methods and their role as a solution to the inefficiencies often encountered in rejection sampling.

    • The concept of constructing a Markov chain for sampling lays the groundwork for further exploration of MCMC applications.

Gibbs Sampling Overview

  • Gibbs Sampling Methodology

    • Gibbs sampling is introduced as a versatile method for targeting high-dimensional posterior distributions.

    • It is applicable to any high-dimensional target, enhancing utility across various statistical models.

Implementing Models in PyMC

  • Overview of PyMC

    • PyMC is introduced as a powerful tool for computational Bayesian inference, facilitating the building and evaluating of complex statistical models.

    • An example setup for a product review model serves to illustrate practical applications.

Output from PyMC

  • Sample Output

    • The section discusses types of outputs generated by PyMC modeling, emphasizing the importance of visual and statistical outputs for inference.

Exoplanet Model in PyMC

  • Mixture Model Implementation

    • The use of a mixture model for modeling exoplanets highlights the complexities of the data involved and corresponding statistical implications.

Sampling Results from Exoplanet Model

  • Multi-Chain Sampling

    • Outputs of the exoplanet model evaluation within the PyMC framework demonstrate practical applications of Bayesian inference in astrophysics and cosmology.

Final Outputs

  • Final Results

    • Conclusions drawn from the sampling outcomes from the exoplanet modeling process provide closure to the methodologies discussed, illustrating relevance and application in modern statistics.