2019
DOI: 10.1007/s00180-018-00861-z
|View full text |Cite
|
Sign up to set email alerts
|

Neural network gradient Hamiltonian Monte Carlo

Abstract: Hamiltonian Monte Carlo is a widely used algorithm for sampling from posterior distributions of complex Bayesian models. It can efficiently explore high-dimensional parameter spaces guided by simulated Hamiltonian flows. However, the algorithm requires repeated gradient calculations, and these computations become increasingly burdensome as data sets scale. We present a method to substantially reduce the computation burden by using a neural network to approximate the gradient. First, we prove that the proposed … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
5

Relationship

2
8

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 7 publications
(12 reference statements)
0
9
0
Order By: Relevance
“…A different kind of question is whether one might make GPU and multi-core SIMD speedups available for a broader class of Bayesian models. Li et al (2019) use neural networks to approximate an arbitrary model's log-posterior gradient and thus avoid expensive HMC gradient computations in a Big Data setting. On the other hand, GPUs greatly accelerate fitting and evaluation of deep neural networks (Bergstra et al, 2011).…”
Section: Discussionmentioning
confidence: 99%
“…A different kind of question is whether one might make GPU and multi-core SIMD speedups available for a broader class of Bayesian models. Li et al (2019) use neural networks to approximate an arbitrary model's log-posterior gradient and thus avoid expensive HMC gradient computations in a Big Data setting. On the other hand, GPUs greatly accelerate fitting and evaluation of deep neural networks (Bergstra et al, 2011).…”
Section: Discussionmentioning
confidence: 99%
“…The joint distribution for the GP W(s) is available in closed form but is cumbersome for large datasets; the joint distribution of the MSP R(s) is available only for a moderate number of spatial locations, and the joint distribution of the mixture model is more complicated that either of its components. An alternative is to build a surrogate likelihood for Bayesian computation (e.g., Rasmussen, 2003;Jabot et al, 2014;Wilkinson, 2014;Gutmann and Corander, 2016;Price et al, 2018;Drovandi et al, 2018;Wang and Li, 2018;Acerbi, 2018;Järvenpää et al, 2019Järvenpää et al, , 2021Li et al, 2019).…”
Section: Mixture Modelmentioning
confidence: 99%
“…Rasmussen (2003) used GPs to jointly model the potential energy and its gradients. More recently, Li et al (2019) obtained better performance with a shallow neural network that is trained on gradient observations during early phases of the sampling procedure. With novel gradient inference routines we revisit the idea to replace ∇ x E by a GP gradient model that is trained on spatially diverse evaluations of the gradient during early phases of the sampling.…”
Section: Hamiltonian Monte Carlomentioning
confidence: 99%