Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference

Li, Zhize; Zhang, Tianyi; Cheng, Shuyu; Zhu, Jun; Li, Jian

doi:10.1007/s10994-019-05825-y

Cited by 13 publications

(39 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sampling from a Bayesian posterior distribution lies at the core of many modern machine learning tasks, such as topic modelling [Gan et al, 2015], reinforcement learning [Liu et al, 2017], and Bayesian neural networks [Hernández-Lobato and Adams, 2015]. Particle based Variational Inference (ParVI) methods have recently drawn great attention due to their empirical success in approximating the target posterior distribution [Liu and Wang, 2016;Liu et al, 2017;Feng et al, 2017;Liu and Zhu, 2018]. Typically, these methods update a finite set of interacting particles deterministically to approximately simulate infinite-particle gradient flows on the Wasserstein space P 2 (X ).…”

Section: Introductionmentioning

confidence: 99%

“…One representative method of this type is the Stein Variational Gradient Descent (SVGD) method [Liu and Wang, 2016], which updates the particles according to a gradient flow described by the Vlasov equation Braun and Hepp, 1977]. Subsequently, by exploiting the Riemannian structure of the Wasserstein space P 2 (X ), [Liu et al, 2019] proposed a Nesterov's-acceleration variant of SVGD called SVGD Wasserstein Nesterov's method (SVGD-WNes).…”

Section: Introductionmentioning

confidence: 99%

“…However, ParVI methods have an intractable pitfall that particles tend to collapse under certain condition due to the deterministic-update fashion with a limited number of particles [Zhang et al, 2020a;Zhuo et al, 2018], which indicates a large deviation of the particles' empirical distribution to target distribution. To understand the influence of finiteparticle approximation to the infinite-particle gradient flows, Liu et al [2019] provided a unified theory on the approximation property of different ParVI methods. They show that existing finite-particle ParVIs can be regarded as essentially smoothing operations on gradient flows, in the form of either smoothing the density or smoothing functions.…”

Section: Introductionmentioning

confidence: 99%

“…Inspired by the random exploration in the dynamics-based Markov Chain Monte Carlo methods, recent researches relieve the particle-collapsing phenomenon by introducing additional stochasticity into ParVI methods [Zhang et al, 2020a;Zhang et al, 2020b]. Zhang et al [2020a] integrated the gradient flow of the first-order Overdamped Langevin Dynamics (OLD) into the Vlasov equation used in SVGD, and proposed a new method called Stochastic Particle Optimization Sampling (SPOS). Specifically, SPOS updates a set of particles following the same mechanism as in SVGD with an additional drift term and an extra Gaussian random noise induced by OLD.…”

Section: Introductionmentioning

confidence: 99%

“…Compared with the first-order OLD, ULD possesses an auxiliary momentum term, and can be regarded as an accelerated secondorder dynamics of OLD. Typically, ULD based methods have both better theoretical guarantees and more competitive practical performance than their OLD based counterparts [Chen et al, 2014;Cheng et al, 2018;Zou et al, 2018;Zou et al, 2019]. We utilize a variant of the first order exponential integrator scheme to discretize the corresponding stochastic differential equation of the augmented flow, which is obtained by integrating the gradient flow of ULD into the Vlasov flow.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

SHPOS: A Theoretical Guaranteed Accelerated Particle Optimization Sampling Method

Zhang

Qian

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Recently, the Stochastic Particle Optimization Sampling (SPOS) method is proposed to solve the particle-collapsing pitfall of deterministic Particle Variational Inference methods by ultilizing the stochastic Overdamped Langevin dynamics to enhance exploration. In this paper, we propose an accelerated particle optimization sampling method called Stochastic Hamiltonian Particle Optimization Sampling (SHPOS). Compared to the first-order dynamics used in SPOS, SHPOS adopts an augmented second-order dynamics, which involves an extra momentum term to achieve acceleration. We establish a non-asymptotic convergence analysis for SHPOS, and show that it enjoys a faster convergence rate than SPOS. Besides, we also propose a variance-reduced stochastic gradient variant of SHPOS for tasks with large-scale datasets and complex models. Experiments on both synthetic and real data validate our theory and demonstrate the superiority of SHPOS over the state-of-the-art.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

SHPOS: A Theoretical Guaranteed Accelerated Particle Optimization Sampling Method

Zhang

Qian

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

High‐dimensional limit theorems for SGD: Effective dynamics and critical scaling

Arous,

Gheissari,

Jagannath

2023

Comm Pure Appl Math

View full text Add to dashboard Cite

We study the scaling limits of stochastic gradient descent (SGD) with constant step‐size in the high‐dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite‐dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step‐size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step‐size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two‐layer networks for binary and XOR‐type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub‐optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.

show abstract

Stochastic differential equation approximations of generative adversarial network training and its long-run behavior

Cao,

Guo

2023

J. Appl. Probab.

View full text Add to dashboard Cite

This paper analyzes the training process of generative adversarial networks (GANs) via stochastic differential equations (SDEs). It first establishes SDE approximations for the training of GANs under stochastic gradient algorithms, with precise error bound analysis. It then describes the long-run behavior of GAN training via the invariant measures of its SDE approximations under proper conditions. This work builds a theoretical foundation for GAN training and provides analytical tools to study its evolution and stability.

show abstract

Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference

Cited by 13 publications

References 8 publications

SHPOS: A Theoretical Guaranteed Accelerated Particle Optimization Sampling Method

SHPOS: A Theoretical Guaranteed Accelerated Particle Optimization Sampling Method

High‐dimensional limit theorems for SGD: Effective dynamics and critical scaling

Stochastic differential equation approximations of generative adversarial network training and its long-run behavior

Contact Info

Product

Resources

About