Natesh S. Pillai scite author profile

Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In contrast to the frequentist literature, little is known about the properties of such priors and the convergence and concentration of the corresponding posterior distribution. In this article, we propose a new class of Dirichlet–Laplace priors, which possess optimal posterior concentration and lead to efficient posterior computation. Finite sample performance of Dirichlet–Laplace priors relative to alternatives is assessed in simulated and real data examples.

show abstract

Lack of confidence in approximate Bayesian computation model choice

Robert

Cornuet

Marin

et al. 2011

Proc. Natl. Acad. Sci. U.S.A.

293

334

View full text Add to dashboard Cite

Approximate Bayesian computation (ABC) have become an essential tool for the analysis of complex stochastic models. Grelaud et al. [(2009) Bayesian Anal 3:427-442] advocated the use of ABC for model choice in the specific case of Gibbs random fields, relying on an intermodel sufficiency property to show that the approximation was legitimate. We implemented ABC model choice in a wide range of phylogenetic models in the Do It Yourself-ABC (DIY-ABC) software [Cornuet et al. (2008) Bioinformatics 24:2713-2719. We now present arguments as to why the theoretical arguments for ABC model choice are missing, because the algorithm involves an unknown loss of information induced by the use of insufficient summary statistics. The approximation error of the posterior probabilities of the models under comparison may thus be unrelated with the computational effort spent in running an ABC algorithm. We then conclude that additional empirical verifications of the performances of the ABC procedure as those available in DIY-ABC are necessary to conduct model choice.Bayes factor | Bayesian model choice | likelihood-free methods | sufficient statistics | consistent tests I nference on population genetic models such as coalescent trees is one representative example of cases when statistical analyses such as Bayesian inference cannot easily operate because the likelihood function associated with the data cannot be computed in a manageable time (1-3). The fundamental reason for this impossibility is that the model associated with coalescent data has to integrate over trees of high complexity.In such settings, traditional approximation tools such as Monte Carlo simulation (4) from the posterior distribution are unavailable for practical purposes. Indeed, due to the complexity of the latent structures defining the likelihood (like the coalescent tree), their simulation is too unstable to bring a reliable approximation in a manageable time. Such complex models call for a practical if cruder approximation method, the approximate Bayesian computation (ABC) methodology (1, 5). This rejection technique bypasses the computation of the likelihood via simulations from the corresponding distribution (see refs. 6 and 7 for recent surveys, and ref. 8 for the wide and successful array of applications based on implementations of ABC in genomics and ecology).We argue here that ABC is a generally valid approximation method for doing Bayesian inference in complex models. However, without further justification, ABC methods cannot be trusted to discriminate between two competing models when based on insufficient summary statistics. We exhibit simple examples in which the information loss due to insufficiency leads to inconsistency, i.e., when the ABC model selection fails to recover the true model, even with infinite amounts of observation and computation. On the one hand, ABC using the entire data leads to a consistent model-choice decision, but it is clearly infeasible in most settings. On the other hand, too much information loss due to insufficiency l...

show abstract

Optimal tuning of the hybrid Monte Carlo algorithm

Beskos¹,

Pillai²,

Roberts³

et al. 2013

Bernoulli

230

320

View full text Add to dashboard Cite

We investigate the properties of the Hybrid Monte-Carlo algorithm (HMC) in high dimensions. HMC develops a Markov chain reversible w.r.t. a given target distribution Π by using separable Hamiltonian dynamics with potential − log Π. The additional momentum variables are chosen at random from the Boltzmann distribution and the continuous-time Hamiltonian dynamics are then discretised using the leapfrog scheme. The induced bias is removed via a Metropolis-Hastings accept/reject rule. In the simplified scenario of independent, identically distributed components, we prove that, to obtain an O(1) acceptance probability as the dimension d of the state space tends to ∞, the leapfrog step-size h should be scaled as h = l × d −1/4 . Therefore, in high dimensions, HMC requires O(d 1/4 ) steps to traverse the state space. We also identify analytically the asymptotically optimal acceptance probability, which turns out to be 0.651 (to three decimal places). This is the choice which optimally balances the cost of generating a proposal, which decreases as l increases, against the cost related to the average number of proposals required to obtain acceptance, which increases as l increases.

show abstract

Universality of covariance matrices

Pillai¹,

Yin²

2014

Ann. Appl. Probab.

127

217

View full text Add to dashboard Cite

In this paper we prove the universality of covariance matrices of the form $H_{N\times N}={X}^{\dagger}X$ where $X$ is an ${M\times N}$ rectangular matrix with independent real valued entries $x_{ij}$ satisfying $\mathbb{E}x_{ij}=0$ and $\mathbb{E}x^2_{ij}={\frac{1}{M}}$, $N$, $M\to \infty$. Furthermore it is assumed that these entries have sub-exponential tails or sufficiently high number of moments. We will study the asymptotics in the regime $N/M=d_N\in(0,\infty),\lim_{N\to\infty}d_N\neq0,\infty$. Our main result is the edge universality of the sample covariance matrix at both edges of the spectrum. In the case $\lim_{N\to\infty}d_N=1$, we only focus on the largest eigenvalue. Our proof is based on a novel version of the Green function comparison theorem for data matrices with dependent entries. En route to proving edge universality, we establish that the Stieltjes transform of the empirical eigenvalue distribution of $H$ is given by the Marcenko-Pastur law uniformly up to the edges of the spectrum with an error of order $(N\eta)^{-1}$ where $\eta$ is the imaginary part of the spectral parameter in the Stieltjes transform. Combining these results with existing techniques we also show bulk universality of covariance matrices. All our results hold for both real and complex valued entries.Comment: Published in at http://dx.doi.org/10.1214/13-AAP939 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

Bayesian Density Regression

2007

View full text Add to dashboard Cite

Summary. This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of parametric densities, with the mixture distribution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. This specification results in a coherent prior for the joint measure, with the marginal random measure at each location being a finite mixture of DP basis measures. Integrating out the infinite-dimensional collection of mixing measures, we obtain a simple expression for the conditional distribution of the subject-specific random variables, which generalizes the Pólya urn scheme. Properties are considered and a simple Gibbs sampling algorithm is developed for posterior computation. The methods are illustrated using simulated data examples and epidemiologic studies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Natesh S. Pillai

Dirichlet–Laplace Priors for Optimal Shrinkage

Lack of confidence in approximate Bayesian computation model choice

Optimal tuning of the hybrid Monte Carlo algorithm

Universality of covariance matrices

Bayesian Density Regression

Contact Info

Product

Resources

About