Kullback--Leibler Approximation for Probability Measures on Infinite Dimensional Spaces

Pinski, F. J.; Simpson, Gideon; Stuart, Andrew M.; Weber, Hendrik

doi:10.1137/140962802

Cited by 51 publications

(67 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Swapping the order of these two measures within the divergence is undesirable for our purposes. This is because minimizing D KL (µ ε ||·) within the set of all Gaussian measures will lead to matching of moments [3]; this is inappropriate for multimodal measures where a more desirable outcome would be the existence of multiple local minimizers at each mode [23,22]. Although the Kullback-Leibler divergence is not a metric, its information theoretic interpretation make it natural for approximate inference.…”

Section: Set-upmentioning

confidence: 99%

Gaussian Approximations for Probability Measures on $R^d$

Lu¹,

Stuart²,

Weber³

2017

SIAM/ASA J. Uncertainty Quantification

View full text Add to dashboard Cite

Abstract. This paper concerns the approximation of probability measures on R d with respect to the Kullback-Leibler divergence. Given an admissible target measure, we show the existence of the best approximation, with respect to this divergence, from certain sets of Gaussian measures and Gaussian mixtures. The asymptotic behavior of such best approximations is then studied in the small parameter limit where the measure concentrates; this asympotic behavior is characterized using Γ-convergence. The theory developed is then applied to understand the frequentist consistency of Bayesian inverse problems in finite dimensions. For a fixed realization of additive observational noise, we show the asymptotic normality of the posterior measure in the small noise limit. Taking into account the randomness of the noise, we prove a Bernstein-Von Mises type result for the posterior measure. 1. Introduction. In this paper, we study the "best" approximation of a general finite dimensional probability measure, which could be non-Gaussian, from a set of simple probability measures, such as a single Gaussian measure or a Gaussian mixture family. We define "best" to mean the measure within the simple class which minimizes the Kullback-Leibler divergence between itself and the target measure. This type of approximation is central to many ideas, especially including the so-called variational inference [30], that are widely used in machine learning [3]. Yet such approximation has not been the subject of any substantial systematic underpinning theory. The purpose of this paper is to develop such a theory in the concrete finite dimensional setting in two ways: (i) by establishing the existence of best approximations and (ii) by studying their asymptotic properties in a measure concentration limit of interest. The abstract theory is then applied to study frequentist consistency [28] of Bayesian inverse problems.

show abstract

Section: Set-upmentioning

confidence: 99%

Gaussian Approximations for Probability Measures on $R^d$

Lu¹,

Stuart²,

Weber³

2017

SIAM/ASA J. Uncertainty Quantification

View full text Add to dashboard Cite

show abstract

“…Contrary to the evidence for minimizing the approximation KL, we find many examples where an approximation is made by minimizing other functionals; for example, minimizing the inference KL (e.g., [7][8][9][10][11][12][13]). For many but not all of them, this is because minimizing the approximation KL is not feasible in practice in their case due to the real distribution p not being accessible.…”

Section: Introductionmentioning

confidence: 66%

“…However, the method only gives rise to maximum a posteriori or maximum likelihood solutions, which corresponds to optimizing the δ-loss of Equation (2). In Reference [11], it is claimed that minimizing the inference KL yields more desirable results since for multi-modal distributions, individual modes can be fitted with a mono-modal distribution such as a Gaussian distribution, whereas the resulting distribution has a very large variance when minimizing the approximation KL to account for all modes. In Figure 1 there is an example of this behavior.…”

Section: Discussionmentioning

confidence: 99%

Optimal Belief Approximation

Leike

Enßlin

2017

Entropy

View full text Add to dashboard Cite

Abstract:In Bayesian statistics probability distributions express beliefs. However, for many problems the beliefs cannot be computed analytically and approximations of beliefs are needed. We seek a loss function that quantifies how "embarrassing" it is to communicate a given approximation. We reproduce and discuss an old proof showing that there is only one ranking under the requirements that (1) the best ranked approximation is the non-approximated belief and (2) that the ranking judges approximations only by their predictions for actual outcomes. The loss function that is obtained in the derivation is equal to the Kullback-Leibler divergence when normalized. This loss function is frequently used in the literature. However, there seems to be confusion about the correct order in which its functional arguments-the approximated and non-approximated beliefs-should be used. The correct order ensures that the recipient of a communication is only deprived of the minimal amount of information. We hope that the elementary derivation settles the apparent confusion. For example when approximating beliefs with Gaussian distributions the optimal approximation is given by moment matching. This is in contrast to many suggested computational schemes.

show abstract

“…In a closely related direction, information metrics provide systematic, practical, and widely used tools to build approximate statistical models of reduced complexity through variational inference methods [64,6,74] for machine learning [98,38,6], and coarse-graining of complex systems at equilibrium [84,10,79,4,5,28]. However, dynamics are of critical importance in reaction networks and such earlier works on equilibrium coarse-graining are not applicable.…”

Section: Introductionmentioning

confidence: 99%