George Deligiannidis scite author profile

When an unbiased estimator of the likelihood is used within a Metropolis-Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of averages computed under this chain. Many Monte Carlo samples will typically result in Metropolis-Hastings averages with lower asymptotic variances than the corresponding Metropolis-Hastings averages using fewer samples. However, the computing time required to construct the likelihood estimator increases with the number of Monte Carlo samples. Under the assumption that the distribution of the additive noise introduced by the log-likelihood estimator is Gaussian with variance inversely proportional to the number of Monte Carlo samples and independent of the parameter value at which it is evaluated, we provide guidelines on the number of samples to select. We demonstrate our results by considering a stochastic volatility model applied to stock index returns.

show abstract

The Correlated Pseudomarginal Method

Deligiannidis

Doucet

Pitt

2018

117

View full text Add to dashboard Cite

Summary The pseudomarginal algorithm is a Metropolis–Hastings‐type scheme which samples asymptotically from a target probability density when we can only estimate unbiasedly an unnormalized version of it. In a Bayesian context, it is a state of the art posterior simulation technique when the likelihood function is intractable but can be estimated unbiasedly by using Monte Carlo samples. However, for the performance of this scheme not to degrade as the number T of data points increases, it is typically necessary for the number N of Monte Carlo samples to be proportional to T to control the relative variance of the likelihood ratio estimator appearing in the acceptance probability of this algorithm. The correlated pseudomarginal method is a modification of the pseudomarginal method using a likelihood ratio estimator computed by using two correlated likelihood estimators. For random‐effects models, we show under regularity conditions that the parameters of this scheme can be selected such that the relative variance of this likelihood ratio estimator is controlled when N increases sublinearly with T and we provide guidelines on how to optimize the algorithm on the basis of a non‐standard weak convergence analysis. The efficiency of computations for Bayesian inference relative to the pseudomarginal method empirically increases with T and exceeds two orders of magnitude in some examples.

show abstract

Exponential ergodicity of the bouncy particle sampler

Deligiannidis¹,

Bouchard‐Côté²,

Doucet³

2019

Ann. Statist.

View full text Add to dashboard Cite

Non-reversible Markov chain Monte Carlo schemes based on piecewise deterministic Markov processes have been recently introduced in applied probability, automatic control, physics and statistics. Although these algorithms demonstrate experimentally good performance and are accordingly increasingly used in a wide range of applications, geometric ergodicity results for such schemes have only been established so far under very restrictive assumptions. We give here verifiable conditions on the target distribution under which the Bouncy Particle Sampler algorithm introduced in [29] is geometrically ergodic. This holds whenever the target satisfies a curvature condition and has tails decaying at least as fast as an exponential and at most as fast as a Gaussian distribution. This allows us to provide a central limit theorem for the associated ergodic averages. When the target has tails thinner than a Gaussian distribution, we propose an original modification of this scheme that is geometrically ergodic. For thick-tailed target distributions, such as t-distributions, we extend the idea pioneered in [19] in a random walk Metropolis context. We apply a change of variable to obtain a transformed target satisfying the tail conditions for geometric ergodicity. By sampling the transformed target using the Bouncy Particle Sampler and mapping back the Markov process to the original parameterization, we obtain a geometrically ergodic algorithm.

show abstract

Non-Reversible Parallel Tempering: A Scalable Highly Parallel MCMC Scheme

Syed

Bouchard‐Côté

Deligiannidis

et al. 2021

View full text Add to dashboard Cite

Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to sample complex high-dimensional probability distributions.They rely on a collection of N interacting auxiliary chains targeting tempered versions of the target distribution to improve the exploration of the state space.We provide here a new perspective on these highly parallel algorithms and their tuning by identifying and formalizing a sharp divide in the behaviour and performance of reversible versus non-reversible PT schemes.We show theoretically and empirically that a class of non-reversible PT methods dominates its reversible counterparts and identify distinct scaling limits for the non-reversible and reversible schemes, the former being a piecewise-deterministic Markov process and the latter a diffusion. These results are exploited to identify the optimal annealing schedule for non-reversible PT and to develop an iterative scheme approximating this schedule. We provide a wide range of numerical examples supporting our theoretical and methodological contributions. The proposed methodology is applicable to sample from a distribution π with a density L with respect to a reference distribution 0 and compute the normalizing constant ∫ L d 0 . A typical use case is when 0 is a prior distribution, L a likelihood function and π the corresponding posterior distribution.

show abstract

Hausdorff dimension, heavy tails, and generalization in neural networks*

Şimşekli¹,

Şener²,

Deligiannidis³

et al. 2021

J. Stat. Mech.

View full text Add to dashboard Cite

Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigorous treatment of the generalization properties of such SDEs in a learning theoretical framework is still missing. Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a Feller process, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case. We show that the generalization error can be controlled by the Hausdorff dimension of the trajectories, which is intimately linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of ‘capacity metric’. We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature.

show abstract

Controlled sequential Monte Carlo

Heng¹,

Bishop²,

Deligiannidis³

et al. 2020

Ann. Statist.

View full text Add to dashboard Cite

Asymptotic variance of the self-intersections of stable random walks using Darboux-Wiener theory

Deligiannidis

Utev

2011

Sib Math J

View full text Add to dashboard Cite

show abstract

The Correlated Pseudo-Marginal Method

Deligiannidis¹,

Doucet²,

Pitt³

2015

Preprint

View full text Add to dashboard Cite

The pseudo-marginal algorithm is a Metropolis-Hastings-type scheme which samples asymptotically from a target probability density when we are only able to estimate unbiasedly an unnormalised version of it. In a Bayesian context, it is a state-of-the-art posterior simulation technique when the likelihood function is intractable but can be estimated unbiasedly using Monte Carlo samples. However, for the performance of this scheme not to degrade as the number T of data points increases, it is typically necessary for the number N of Monte Carlo samples to be proportional to T to control the relative variance of the likelihood ratio estimator appearing in the acceptance probability of this algorithm. The correlated pseudo-marginal algorithm is a modification of the pseudo-marginal method using a likelihood ratio estimator computed using two correlated likelihood estimators. For random effects models, we show under regularity conditions that the parameters of this scheme can be selected such that the relative variance of this likelihood ratio estimator is controlled when N increases sublinearly with T and we provide guidelines on how to optimise the parameters of the algorithm based on a non-standard weak convergence analysis. The efficiency of computations for Bayesian inference relative to the pseudo-marginal method empirically increases with T and is higher than two orders of magnitude in some of our examples.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.