Robust Deep Gaussian Processes

Knoblauch, Jeremias

doi:10.48550/arxiv.1904.02303

Cited by 3 publications

(4 citation statements)

References 12 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lastly, we present numerical experiments and their results (Section 6.2.4). These findings are also summarized with a higher level of detail in a separate technical report (Knoblauch, 2019b).…”

Section: Deep Gaussian Processesmentioning

confidence: 97%

Generalized Variational Inference: Three arguments for deriving new Posteriors

Knoblauch,

Jewson,

Damoulas

2019

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper we advocate an optimization-centric view on Bayesian statistics and introduce a novel generalization of Bayesian inference. On both counts, our inspiration is the representation of Bayes' rule as an infinite-dimensional optimization problem as shown independently by Csiszár (1975);Donsker and Varadhan (1975);Zellner (1988). First, we use this representation to prove a surprising optimality result of standard Variational Inference (VI) methods: Under the proposed view, the standard Evidence Lower Bound (ELBO) maximizing VI posterior is always preferable to alternative approximations of the Bayesian posterior. Next, we argue for an optimization-centric generalization of standard Bayesian inference. The need for this generalization arises in situations of severe misalignment between reality and three assumptions underlying the standard Bayesian posterior: (1) Well-specified priors, (2) well-specified likelihood models and (3) the availability of infinite computing power. In response to this observation, our generalization is defined by three arguments and named the Rule of Three (RoT). Each of its three arguments relaxes one of the assumptions underlying standard Bayesian inference. We axiomatically derive the RoT and recover existing methods as special cases, including the Bayesian posterior and its approximation by standard Variational Inference (VI). In contrast, alternative approximations to the Bayesian posterior maximizing other ELBO-like objectives violate these axioms. Finally, we introduce a special case of the RoT that we call Generalized Variational Inference (GVI).GVI posteriors are a large and tractable family of belief distributions specified by three arguments: A loss, a divergence and a variational family. GVI posteriors possess appealing theoretical properties, including consistency and an interpretation as an approximate ELBO.

show abstract

Section: Deep Gaussian Processesmentioning

confidence: 97%

Generalized Variational Inference: Three arguments for deriving new Posteriors

Knoblauch,

Jewson,

Damoulas

2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Another objective based on ELBO, referred to as γrobust [19], aims at providing robustness to the training of GPs with respect to model misspecification and uncertainty control, where the log-likelihood term in ELBO is replaced with γ-divergence: where γ is a hyperparameter and is set to be equal to 1.03 for the rest of the paper.…”

Section: B Scalable Variational Gaussian Processesmentioning

confidence: 99%

“…In particular, inducing point methods [18] allow for scalable approximations to the exact GPs by introducing the learnable latent points variables that sparsify the GP model. Several scalable approximations of the marginal log-likelihood objectives were proposed recently [19], [16] that rely on bounding the model evidence and allowing to substitute the expensive-to-evaluate exact marginal log-likelihood objective.…”

Section: Introductionmentioning

confidence: 99%

Learning inducing points and uncertainty on molecular data

Tsitsvero¹

2022

Preprint

View full text Add to dashboard Cite

Uncertainty control and scalability to large datasets are the two main issues for the deployment of Gaussian process models into the autonomous material and chemical space exploration pipelines. One way to address both of these issues is by introducing the latent inducing variables and choosing the right approximation for the marginal log-likelihood objective. Here, we show that variational learning of the inducing points in the high-dimensional molecular descriptor space significantly improves both the prediction quality and uncertainty estimates on test configurations from a sample molecular dynamics dataset. Additionally, we show that inducing points can learn to represent the configurations of the molecules of different types that were not present within the initialization set of inducing points. Among several evaluated approximate marginal log-likelihood objectives, we show that the predictive log-likelihood provides both the predictive quality comparable to the exact Gaussian process model and excellent uncertainty control. Finally, we comment on whether a machine learning model makes predictions by interpolating the molecular configurations in highdimensional descriptor space. We show that despite our intuition, and even for densely sampled molecular dynamics datasets, most of the predictions are done in the extrapolation regime.

show abstract

“…2.4; as well as two 2-layer models: (DGP) a variational DGP as described in Sec. 2.3; and (γ-DGP) the robust DGP described in Knoblauch (2019); .…”

Section: Multivariate Regressionmentioning

confidence: 99%

Deep Sigma Point Processes

Jankowiak,

Pleiss,

Gardner

2020

Preprint

View full text Add to dashboard Cite

We introduce Deep Sigma Point Processes, a class of parametric models inspired by the compositional structure of Deep Gaussian Processes (DGPs). Deep Sigma Point Processes (DSPPs) retain many of the attractive features of (variational) DGPs, including mini-batch training and predictive uncertainty that is controlled by kernel basis functions. Importantly, since DSPPs admit a simple maximum likelihood inference procedure, the resulting predictive distributions are not degraded by any posterior approximations. In an extensive empirical comparison on univariate and multivariate regression tasks we find that the resulting predictive distributions are significantly better calibrated than those obtained with other probabilistic methods for scalable regression, including variational DGPsoften by as much as a nat per datapoint.

show abstract

Robust Deep Gaussian Processes

Cited by 3 publications

References 12 publications

Generalized Variational Inference: Three arguments for deriving new Posteriors

Generalized Variational Inference: Three arguments for deriving new Posteriors

Learning inducing points and uncertainty on molecular data

Deep Sigma Point Processes

Contact Info

Product

Resources

About