Adel Javanmard scite author profile

We consider the problem of fitting the parameters of a high-dimensional linear regression model. In the regime where the number of parameters p is comparable to or exceeds the sample size n, a successful approach uses an 1 -penalized least squares estimator, known as Lasso.Unfortunately, unlike for linear estimators (e.g., ordinary least squares), no well-established method exists to compute confidence intervals or p-values on the basis of the Lasso estimator. Very recently, a line of work [JM13b, JM13a, vdGBR13] has addressed this problem by constructing a debiased version of the Lasso estimator. In this paper, we study this approach for random design model, under the assumption that a good estimator exists for the precision matrix of the design. Our analysis improves over the state of the art in that it establishes nearly optimal average testing power if the sample size n asymptotically dominates s 0 (log p) 2 , with s 0 being the sparsity level (number of non-zero coefficients). Earlier work obtains provable guarantees only for much larger sample size, namely it requires n to asymptotically dominate (s 0 log p) 2 .In particular, for random designs with a sparse precision matrix we show that an estimator thereof having the required properties can be computed efficiently. Finally, we evaluate this approach on synthetic data and compare it with earlier proposals.

show abstract

State evolution for general approximate message passing algorithms, with applications to spatial coupling

Javanmard

Montanari

2013

Information and Inference

211

377

View full text Add to dashboard Cite

We consider a class of approximated message passing (AMP) algorithms and characterize their high-dimensional behavior in terms of a suitable state evolution recursion. Our proof applies to Gaussian matrices with independent but not necessarily identically distributed entries. It coversin particular-the analysis of generalized AMP, introduced by Rangan, and of AMP reconstruction in compressed sensing with spatially coupled sensing matrices.The proof technique builds on the one of [BM11], while simplifying and generalizing several steps.

show abstract

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing

Donoho

Javanmard

Montanari

2013

IEEE Trans. Inform. Theory

206

237

View full text Add to dashboard Cite

We study the compressed sensing reconstruction problem for a broad class of random, banddiagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. [KMS + 11], message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with undersampling rates close to the fraction of non-zero coordinates.We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate δ exceeds the (upper) Rényi information dimension of the signal, d(p X ). More precisely, for a sequence of signals of diverging dimension n whose empirical distribution converges to p X , reconstruction is with high probability successful from d(p X ) n + o(n) measurements taken according to a band diagonal matrix.For sparse signals, i.e., sequences of dimension n and k(n) non-zero entries, this implies reconstruction from k(n) + o(n) measurements. For 'discrete' signals, i.e., signals whose coordinates take a fixed finite set of values, this implies reconstruction from o(n) measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal p X .

show abstract

Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks

Soltanolkotabi

Javanmard

Lee

2019

IEEE Trans. Inform. Theory

258

201

View full text Add to dashboard Cite

In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients.

show abstract

Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing

2012

View full text Add to dashboard Cite

show abstract

Debiasing the lasso: Optimal sample size for Gaussian designs

Javanmard¹,

Montanari²

2018

Ann. Statist.

145

163

View full text Add to dashboard Cite

Performing statistical inference in high-dimensional models is an outstanding challenge. A major source of difficulty is the absence of precise information on the distribution of high-dimensional regularized estimators.Here, we consider linear regression in the high-dimensional regime p n and the Lasso estimator. In this context, we would like to perform inference on a high-dimensional parameters vector θ * ∈ R p . Important progress has been achieved in computing confidence intervals and p-values for single coordinates θ * i , i ∈ {1, . . . , p}. A key role in these new inferential methods is played by a certain debiased (or de-sparsified) estimator θ d that is constructed from the Lasso estimator. Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of θ d are asymptotically Gaussian provided the true parameters vector θ * is s 0 -sparse with s 0 = o( √ n/ log p). The condition s 0 = o( √ n/ log p) is considerably stronger than the one required for consistent estimation, namely s 0 = o(n/ log p). In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition s 0 = o(n/(log p) 2 ). Note that earlier work was limited to s 0 = o( √ n/ log p) even for perfectly known covariance. The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well, e.g. under the same sparsity conditions on the inverse covariance as assumed by earlier work. For intermediate regimes, we describe the trade-off between sparsity in the coefficients θ * , and sparsity in the inverse covariance of the design. We further discuss several other applications of our results to high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor 1 + o n (1) for i.i.d. Gaussian designs.

show abstract

Online rules for control of false discovery rate and false discovery exceedance

Javanmard¹,

Montanari²

2018

Ann. Statist.

149

View full text Add to dashboard Cite

Multiple hypothesis testing is a core problem in statistical inference and arises in almost every scientific field. Given a set of null hypotheses H(n) = (H 1 , . . . , H n ), Benjamini and Hochberg [BH95] introduced the false discovery rate (FDR), which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assigned significance level. Nowadays FDR is the criterion of choice for largescale multiple hypothesis testing.In this paper we consider the problem of controlling FDR in an online manner. Concretely, we consider an ordered -possibly infinite-sequence of null hypotheses H = (H 1 , H 2 , H 3 , . . . ) where, at each step i, the statistician must decide whether to reject hypothesis H i having access only to the previous decisions. This model was introduced by Foster and Stine [FS08].We study a class of generalized alpha investing procedures, first introduced by Aharoni and Rosset [AR14]. We prove that any rule in this class controls online FDR, provided pvalues corresponding to true nulls are independent from the other p-values. Earlier work only established mFDR control. Next, we obtain conditions under which generalized alpha investing controls FDR in the presence of general p-values dependencies. We also develop a modified set of procedures that allow to control the false discovery exceedance (the tail of the proportion of false discoveries). Finally, we evaluate the performance of online procedures on both synthetic and real data, comparing them with offline approaches, such as adaptive Benjamini-Hochberg.

show abstract

Phase transitions in semidefinite relaxations

Javanmard

Montanari

Ricci-Tersenghi

2016

Proc. Natl. Acad. Sci. U.S.A.

107

125

View full text Add to dashboard Cite

Statistical inference problems arising within signal processing, data mining, and machine learning naturally give rise to hard combinatorial optimization problems. These problems become intractable when the dimensionality of the data is large, as is often the case for modern datasets. A popular idea is to construct convex relaxations of these combinatorial problems, which can be solved efficiently for large-scale datasets. Semidefinite programming (SDP) relaxations are among the most powerful methods in this family and are surprisingly well suited for a broad range of problems where data take the form of matrices or graphs. It has been observed several times that when the statistical noise is small enough, SDP relaxations correctly detect the underlying combinatorial structures. In this paper we develop asymptotic predictions for several detection thresholds, as well as for the estimation error above these thresholds. We study some classical SDP relaxations for statistical problems motivated by graph synchronization and community detection in networks. We map these optimization problems to statistical mechanics models with vector spins and use nonrigorous techniques from statistical mechanics to characterize the corresponding phase transitions. Our results clarify the effectiveness of SDP relaxations in solving high-dimensional statistical problems. Modern datasets pose new challenges to this centuries-old framework. On one hand, high-dimensional applications require the simultaneous estimation of millions of parameters. Examples span genomics (2), imaging (3), web services (4), and so on. On the other hand, the unknown object to be estimated has often a combinatorial structure: In clustering we aim at estimating a partition of the data points (5). Network analysis tasks usually require identification of a discrete subset of nodes in a graph (6, 7). Parsimonious data explanations are sought by imposing combinatorial sparsity constraints (8).There is an obvious tension between the above requirements. Although efficient algorithms are needed to estimate a large number of parameters, the maximum likelihood (ML) method often requires the solution of NP-hard (nondeterministic polynomial-time hard) combinatorial problems. A flourishing line of work addresses this conundrum by designing effective convex relaxations of these combinatorial problems (9-11).Unfortunately, the statistical properties of such convex relaxations are well understood only in a few cases [compressed sensing being the most important success story (12-14)]. In this paper we use tools from statistical mechanics to develop a precise picture of the behavior of a class of semidefinite programming relaxations. Relaxations of this type appear to be surprisingly effective in a variety of problems ranging from clustering to graph synchronization. For the sake of concreteness we will focus on three specific problems. Z 2 SynchronizationIn the general synchronization problem, we aim at estimating x 0,1 , x 0,2 , . . . , x 0,n , which are unknown elements of ...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adel Javanmard

Nearly optimal sample size in hypothesis testing for high-dimensional regression

State evolution for general approximate message passing algorithms, with applications to spatial coupling

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing

Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks

Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing

Debiasing the lasso: Optimal sample size for Gaussian designs

Online rules for control of false discovery rate and false discovery exceedance

Phase transitions in semidefinite relaxations

Contact Info

Product

Resources

About