Bayesian Inference

Remember that using Bayes’ Theorem doesn’t make you a Bayesian. Quantifying uncertainty with probability makes you a Bayesian. (Michael Betancourt)

Overview

Books
- Free book on Bayesian Inference written in Jupyter Notebooks - Bayesian Methods for Hackers Cam Davidson-Pilon (the main author)
- Bayesian Data Analysis (2020) Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin
Bayesian Workflow (2020) Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, Martin Modrák

Markov chain Monte Carlo (MCMC)

Introduction to Markov Chain Monte Carlo (2011) Charles Geyer
MH Metropolis–Hastings algorithm
- Equation of State Calculations by Fast Computing Machines (1953) N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller
- Monte Carlo Sampling Methods Using Markov Chains and Their Applications (1970) W. K. Hastings
- Understanding the Metropolis-Hastings Algorithm (1995) S. Chib, E. Greenberg
HMC Hamiltonian Monte Carlo
- MCMC Using Hamiltonian Dynamics (2011) Radford M. Neal
- The Geometric Foundations of Hamiltonian Monte Carlo (2014) Michael Betancourt, Simon Byrne, Sam Livingstone, MarkGirolami
- A Conceptual Introduction to Hamiltonian Monte Carlo (2017) Michael Betancourt
NUTS
- The No-U-Turn Sampler: Adaptively Setting Path Lengthsin Hamiltonian Monte Carlo (2014) Matthew D. Hoffman, Andrew Gelman

Stochastic Gradient MCMC (SG-MCMC)

SGLD Stochastic Gradient Langevin Dynamics
- Bayesian Learning via Stochastic Gradient Langevin Dynamics (2011) Max Welling, Yee Whye Teh — Shows that adding calibrated noise to stochastic gradient descent produces asymptotically exact posterior samples, enabling Bayesian inference to scale to large datasets for the first time without full-batch MCMC.
SGHMC Stochastic Gradient Hamiltonian Monte Carlo
- Stochastic Gradient Hamiltonian Monte Carlo (2014) Tianqi Chen, Emily B. Fox, Carlos Guestrin — Extends SGLD by incorporating momentum (as in HMC above), adding a friction term to correct for gradient noise and improving mixing over the random-walk behavior of SGLD.
A Complete Recipe for Stochastic Gradient MCMC (2015) Yi-An Ma, Tianqi Chen, Emily B. Fox — Provides a unifying framework showing that SGLD, SGHMC, and other SG-MCMC variants are all special cases of continuous Markov processes parameterized by two matrices, and introduces new samplers like SGRHMC within this framework.

Sequential Monte Carlo (SMC)

Sequential Monte Carlo Methods in Practice (2001) Arnaud Doucet, Nando de Freitas, Neil Gordon (Editors) — The foundational reference on particle filters: propagates weighted samples through a sequence of distributions, enabling online inference in state-space models where MCMC would require re-running from scratch.
Sequential Monte Carlo Samplers (2006) Pierre Del Moral, Arnaud Doucet, Ajay Jasra — Generalizes SMC beyond filtering to sample from arbitrary sequences of static distributions, making it applicable to Bayesian model comparison and tempered posteriors — the key theoretical bridge between particle filters and general Bayesian computation.
An Introduction to Sequential Monte Carlo (2020) Nicolas Chopin, Omiros Papaspiliopoulos — Modern textbook treatment covering both the theory (Feynman-Kac formalism) and practice of SMC, including waste-free SMC and connections to tempering strategies used in modern samplers.

Approximate Bayesian Computation (ABC)

Approximate Bayesian Computation in Population Genetics (2002) Mark A. Beaumont, Wenyang Zhang†and, David J. Balding
Markov chain Monte Carlo without likelihoods (2003) Paul Marjoram, John Molitor, Vincent Plagnol, Simon Tavare
Sequential Monte Carlo without likelihoods (2007) S. A. Sisson, Y. Fan†, Mark M. Tanak
Non-linear regression models for Approximate Bayesian Computation (2009) Michael G.B. Blum, Olivier François
Likelihood-free Markov chain Monte Carlo (2010) Scott A. Sisson, Yanan Fan
Approximate Bayesian Computation(ABC) in practice (2010) Katalin Csillery, Michael G.B. Blum, Oscar E. Gaggiotti, Olivier Francois
Hamiltonian ABC
Reliable ABC model choice via random forests (2016) Pierre Pudlo, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, Christian P. Robert

Variational Inference (VI)

Bayesian parameter estimation viavariational methods (1999) T.S. Jaakkola, M.I. Jordan
The Variational Gaussian Approximation Revisited (2009) Manfred Opper, Cedric Archambeau
Doubly Stochastic Variational Bayes for non-Conjugate Inference (2014) Michalis K. Titsias, Miguel Lazaro-Gredilla
Variational Inference: A Review for Statisticians (2018) David M. Blei, Alp Kucukelbir, Jon D. McAuliffe
Advances in Variational Inference (2018) Cheng Zhang, Judith Butepage, Hedvig Kjellstrom, Stephan Mandt — Comprehensive review organizing the VI landscape into four threads: scalable VI (stochastic optimization), generic VI (non-conjugate models), accurate VI (beyond mean-field, including normalizing flows), and amortized VI (inference networks).
SVGD Stein Variational Gradient Descent
- Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm (2019) Qiang Liu, Dilin Wang

Normalizing Flows for Inference

Variational Inference with Normalizing Flows (2015) Danilo Rezende, Shakir Mohamed — Introduces the idea of transforming a simple variational posterior through a chain of invertible mappings, breaking free of the mean-field assumption that limits standard VI and enabling arbitrarily complex approximate posteriors.
Normalizing Flows for Probabilistic Modeling and Inference (2021) George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, Balaji Lakshminarayanan — Definitive review of flow architectures (coupling, autoregressive, residual), their expressive power, and applications spanning density estimation, variational inference, and simulation-based inference.
Model-Informed Flows for Bayesian Inference (2025) Joohwan Ko, Justin Domke — Proves that Variationally Inferred Parameters (VIP) can be represented exactly as autoregressive flows augmented with the model’s prior, then exploits this connection to design Model-Informed Flows that deliver tighter posteriors for hierarchical Bayesian models.

Expectation Propagation (EP)

Expectation Propagation for Approximate Bayesian Inference (2001) Thomas P. Minka — Proposes a deterministic alternative to MCMC and VI that iteratively refines local likelihood approximations by moment matching, unifying assumed-density filtering and loopy belief propagation; often more accurate than the Laplace approximation (below) and variational Bayes at comparable cost.
Expectation Propagation as a Way of Life (2020) Aki Vehtari, Andrew Gelman, Tuomas Sivula, Pasi Jylänki, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John P. Cunningham, David Schiminovich, Christian Robert — Reframes EP as a framework for distributed Bayesian inference: data partitions communicate through iteratively refined approximate likelihoods, enabling parallelism while preserving information sharing — addressing scalability limits of both standard EP and MCMC.

Laplace Approximation

A Practical Bayesian Framework for Backpropagation Networks (1992) David J.C. MacKay — Pioneering work applying a second-order Taylor expansion (Laplace approximation) around the MAP estimate to approximate the posterior over neural network weights, enabling model comparison via the Bayesian evidence — the simplest deterministic approach to Bayesian neural networks.
Laplace Redux — Effortless Bayesian Deep Learning (2021) Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, Philipp Hennig — Revives MacKay’s Laplace approach for modern deep networks with scalable Kronecker-factored and last-layer approximations; shows it is competitive with MC Dropout and ensembles (see Bayesian Deep Learning below) at a fraction of the cost, and provides the laplace-torch library.

Simulation-Based Inference (SBI)

The Frontier of Simulation-Based Inference (2020) Kyle Cranmer, Johann Brehmer, Gilles Louppe — Landmark review of the shift from classical ABC methods (above) to neural network-based likelihood-free inference; surveys how neural density estimators, classifiers, and ratio estimators replace the rejection/tolerance mechanisms of ABC with learned surrogates.
NPE Neural Posterior Estimation
- Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation (2016) George Papamakarios, Iain Murray — First neural posterior estimation method: trains a mixture density network on simulator outputs to directly learn the posterior, eliminating the ε-tolerance that limits ABC and requiring orders of magnitude fewer simulations.
- Automatic Posterior Transformation for Likelihood-Free Inference (2019) David Greenberg, Marcel Nonnenmacher, Jakob H. Macke — Extends NPE to the sequential setting with normalizing flows (see Normalizing Flows above) as the density estimator, allowing arbitrary proposal distributions and enabling iterative refinement of the posterior with each simulation round.
NLE Neural Likelihood Estimation
- Sequential Neural Likelihood (2019) George Papamakarios, David Sterratt, Iain Murray — Instead of learning the posterior directly (as NPE does), learns a neural surrogate of the likelihood using autoregressive flows, then plugs it into standard MCMC — more robust to model misspecification and composable with different priors without retraining.
NRE Neural Ratio Estimation
- Approximating Likelihood Ratios with Calibrated Discriminative Classifiers (2015) Kyle Cranmer, Juan Pavez, Gilles Louppe — Trains a classifier to distinguish parameter-data pairs, whose output directly estimates the likelihood ratio — avoids density estimation entirely, requiring only a binary classification objective, and is well-suited to hypothesis testing.
- Benchmarking Simulation-Based Inference (2021) Jan-Matthis Lueckmann, Jan Boelts, David Greenberg, Pedro Goncalves, Jakob Macke — Systematic comparison of NPE, NLE, NRE and classical ABC on standardized tasks; finds that neural methods consistently outperform ABC but no single algorithm dominates, and that sequential variants improve sample efficiency.

Diffusion Models for Posterior Sampling

Score-Based Generative Modeling through Stochastic Differential Equations (2021) Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole — Unifies score matching and diffusion models as continuous-time SDEs that gradually corrupt data into noise and reverse the process via learned score functions; provides the theoretical foundation for using diffusion models as priors in Bayesian inverse problems.
Diffusion Posterior Sampling for General Noisy Inverse Problems (2023) Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc L. Klasky, Jong Chul Ye — Combines a pretrained diffusion prior (from above) with a measurement likelihood to sample from the Bayesian posterior for inverse problems, using manifold-constrained gradients to handle both linear and nonlinear forward models with noise.
Score-based diffusion models for diffuse optical tomography with uncertainty quantification (2026) Fabian Schneider, Meghdoot Mozumder, Konstantin Tamarov, Leila Taghizadeh, Tanja Tarvainen, Tapio Helin, Duc-Lam Duong — Applies the diffusion posterior sampling framework to medical imaging, introducing a regularization strategy that blends learned and model-based scores to prevent overfitting; demonstrates calibrated uncertainty estimates with lower variance than classical Bayesian methods.

Bayesian Deep Learning

Weight Uncertainty in Neural Networks (2015) Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra — Introduces “Bayes by Backprop”: maintains a variational distribution over each weight (rather than a point estimate), optimizing the variational free energy with reparameterized gradients — the first practical VI method (see VI above) for modern deep networks.
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016) Yarin Gal, Zoubin Ghahramani — Reinterprets standard dropout training as approximate inference in a deep Gaussian process, enabling uncertainty estimates from any existing dropout network at test time with zero additional cost — far cheaper than Bayes by Backprop but less flexible.
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles (2017) Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell — Proposes training multiple networks with random initialization as a non-Bayesian alternative for uncertainty; despite its simplicity, deep ensembles empirically match or outperform both MC Dropout and Bayes by Backprop on calibration and out-of-distribution detection.
Bayesian Deep Learning and a Probabilistic Perspective of Generalization (2020) Andrew Gordon Wilson, Pavel Izmailov — Argues that deep ensembles succeed precisely because they approximate Bayesian model averaging, proposes MultiSWAG for cheaper within-basin marginalization, and shows that Bayesian averaging resolves pathologies like double descent.
Bayesian Computation in Deep Learning (2025) Wenlong Chen, Bolian Li, Ruqi Zhang, Yingzhen Li — Recent review organizing the Bayesian deep learning toolbox around two computational pillars: SG-MCMC (see above) and VI, covering their challenges (multimodality, cold posteriors) and solutions specific to deep neural networks and deep generative models.
See also: Bayesian Neural Networks in Neural Networks

Gaussian processes

Gaussian Processes for Machine Learning (2006) Carl E. Rasmussen, Christopher K. I. Williams

Uncertainty calibration

The Well-Calibrated Bayesian (1982) A.P. Dawid
Transforming Classifier Scores into Accurate Multiclass Probability Estimates (2002) Bianca Zadrozny, Charles Elkan
Predicting Good Probabilities With Supervised Learning (2005) Alexandru Niculescu-Mizil, Rich Caruana
Nearly-Isotonic Regression (2011) Ryan J. Tibshirani, Holger Hoefling, Robert Tibshirani
Binary classifier calibration using an ensemble of piecewise linear regression models (2012) Mahdi Pakdaman Naeini, Gregory F. Cooper
Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers (2017) Meelis Kull, Telmo de Menezes e Silva Filho, Peter Flach
Verified Uncertainty Calibration (2019) Ananya Kumar, Percy Liang, Tengyu Ma
Improving Regression Uncertainty Estimates with an Empirical Prior (2020) Eric Zelikman, Christopher Healy

Software

Stan
BUGS
JAGS
R
Python
- PyMC (formerly PyMC3/PyMC4)
- Tensorflow Probability
- Edward - now part of Tensorflow Probability
- Zhusuan
- Pyro
- NumPyro - Pyro on JAX
- Brancher
- Bambi - PyMC and Stan interface
- BRMP - Pyro interface
- BlackJAX - JAX-based sampling and VI
- sbi - Simulation-based inference toolkit
- Flowjax - Normalizing flows in JAX
- Zuko - Normalizing flows in PyTorch
- laplace-torch - Laplace approximation for PyTorch
Julia
- BAT.jl
- Gen
- Turing.jl
- Soss.jl
- DynamicPPL
Javascript
- WebPPL
Web
- StatSim
Installable
- JASP

Star Issue

Bayesian Inference

Overview

Markov chain Monte Carlo (MCMC)

Stochastic Gradient MCMC (SG-MCMC)

Sequential Monte Carlo (SMC)

Approximate Bayesian Computation (ABC)

Variational Inference (VI)

Normalizing Flows for Inference

Expectation Propagation (EP)

Laplace Approximation

Simulation-Based Inference (SBI)

Diffusion Models for Posterior Sampling

Bayesian Deep Learning

Gaussian processes

Uncertainty calibration

Software

Related Topics