Projected Wasserstein gradient descent for high-dimensional Bayesian inference

Wang, Yifei; Chen, Peng; Li, Wuchen

doi:10.48550/arxiv.2102.06350

Cited by 2 publications

(4 citation statements)

References 28 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are various formulations of particle-based variational inference, depending on how variational approximation and discretization are applied to derive finite-particle update rules. This section introduces a specific form of the inference method used in this study, named Wasserstein gradient descent (WGD) (Liu et al, 2019;Wang et al, 2021;. Note that the choice of an inference method is independent of the space in which the inference is performed, and we denote the variables to be inferred as w ∈ W here.…”

Section: Wasserstein Gradient Descentmentioning

confidence: 99%

“…In particular, the repulsive term of the update rule (4) involves KDE, which is known to suffer from the curse of dimensionality (Scott, 1991). Thus, inspired by (Wang et al, 2021;Chen & Ghattas, 2020), we consider estimating the density in a low-dimensional subspace in which the likelihood of data changes significantly.…”

Section: Wgd On Feature Spacementioning

confidence: 99%

“…Roughly speaking, we evaluate the repulsive term only on feature elements that significantly affect the prediction results. Note that in (Wang et al, 2021;Chen & Ghattas, 2020), the entire update rule including the driving term is projected onto a subspace, but we only project the repulsive (KDE) term because we found that yields more stable training on neural networks.…”

Section: Wgd On Feature Spacementioning

confidence: 99%

See 2 more Smart Citations

Feature Space Particle Inference for Neural Network Ensembles

Yashima¹,

Suzuki²,

Ishikawa³

et al. 2022

Preprint

View full text Add to dashboard Cite

Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while keeping their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from inefficiency due to the overparameterization issues, while seeking samples directly from the function-space posterior often results in serious underfitting. In this study, we propose optimizing particles in the feature space where the activation of a specific intermediate layer lies to address the above-mentioned difficulties. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness. Extensive evaluation on real-world datasets shows that our model significantly outperforms the gold-standard Deep Ensembles on various metrics, including accuracy, calibration, and robustness.

show abstract

Section: Wasserstein Gradient Descentmentioning

confidence: 99%

Section: Wgd On Feature Spacementioning

confidence: 99%

Section: Wgd On Feature Spacementioning

confidence: 99%

See 1 more Smart Citation

Feature Space Particle Inference for Neural Network Ensembles

Yashima¹,

Suzuki²,

Ishikawa³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Moreover, our Wasserstein gradient descent using the SGE approximation can also be derived using an alternative formulation as a gradient flow with smoothed test functions [44]. A projected version of WGD has been studied in [65], which could also be readily applied in our framework. Besides particle methods, Bayesian neural networks MacKay [49], Neal [54] have gained popularity recently [69,18,16,32], using modern MCMC [54,69,18,20,17] and variational inference techniques [4,63,14,30].…”

Section: Related Workmentioning

confidence: 99%

Repulsive Deep Ensembles are Bayesian

D’Angelo¹,

Fortuin²

2021

Preprint

View full text Add to dashboard Cite

Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.Preprint. Under review.

show abstract

Projected Wasserstein gradient descent for high-dimensional Bayesian inference

Cited by 2 publications

References 28 publications

Feature Space Particle Inference for Neural Network Ensembles

Feature Space Particle Inference for Neural Network Ensembles

Repulsive Deep Ensembles are Bayesian

Contact Info

Product

Resources

About