We introduce a new estimator for the vector of coefficients β in the linear model y = Xβ + z, where X has dimensions n × p with p possibly larger than n. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to minb∈ℝp12‖y−Xb‖ℓ22+λ1false|bfalse|false(1false)+λ2false|bfalse|false(2false)+⋯+λpfalse|bfalse|false(pfalse),where λ1 ≥ λ2 ≥ … ≥ λp ≥ 0 and false|bfalse|false(1false)≥false|bfalse|false(2false)≥⋯≥false|bfalse|false(pfalse) are the decreasing absolute values of the entries of b. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical ℓ1 procedures such as the Lasso. Here, the regularizer is a sorted ℓ1 norm, which penalizes the regression coefficients according to their rank: the higher the rank—that is, stronger the signal—the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300] procedure (BH) which compares more significant p-values with more stringent thresholds. One notable choice of the sequence {λi} is given by the BH critical values λBHfalse(ifalse)=zfalse(1−i⋅q/2pfalse), where q ∈ (0, 1) and z(α) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH provably controls FDR at level q. Moreover, it also appears to have appreciable inferential properties under more general designs X while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.
We present results of a joint computational and experimental study for a series of annulated metalloporphyrins in order to establish structure-property relationships. Specifically, we have examined the effects of substitution by meso-tetraphenylation, tetrabenzo and tetranaphtho annulation, and effects of changing the central metal from zinc (Zn) to palladium (Pd). Utilizing absorption and emission spectroscopy and laser flash photolysis techniques, the photophysical properties of these porphyrins have been determined. Upon the addition of benzo or naphtho groups, we observed an overall red shift in the ground state absorption spectra of both the B-bands and the Q-bands with increased conjugation and an increase in the Q-band to B-band intensity ratios. Time-dependent density functional theory calculations were performed on both series of porphyrins to identify the effects of phenyl, benzo, and naphtho substituents on the spectra. The benzo and naphtho adducts provide a larger contribution (typically 40-90%) to the observed red shifts due to increased π-conjugation, while there is a smaller contribution (typically 0-25%) from distortion of the porphyrin. Similarly, a red shift for the T 1 -T n absorption spectrum and an overall general broadening in the spectrum were found with increased conjugation. An increase in the triplet molar extinction coefficient through the near-infrared region with annulation was also found. Varying the metal has an effect on the overall absorption spectra; i.e., the ground state spectra of the Zn porphyrins are red-shifted relative to the Pd porphyrins. For the triplet excited state spectra there were small effects in the spectra by changing the metal with a significant contribution to the kinetic properties by the heavy atom effect of the Pd.
In the past decade, differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analysing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation of differential privacy, which we term ‘f‐differential privacy’ (f‐DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f‐DP faithfully preserves the hypothesis testing interpretation of differential privacy, thereby making the privacy guarantees easily interpretable. In addition, f‐DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for the original differential privacy definition to f‐DP and, as an application of this technique, obtain a simple and easy‐to‐interpret theorem of privacy amplification by subsampling for f‐DP. In addition to the above findings, we introduce a canonical single‐parameter family of privacy notions within the f‐DP class that is referred to as ‘Gaussian differential privacy’ (GDP), defined based on hypothesis testing of two shifted Gaussian distributions. GDP is the focal privacy definition among the family of f‐DP guarantees due to a central limit theorem for differential privacy that we prove. More precisely, the privacy guarantees of any hypothesis testing based definition of privacy (including the original differential privacy definition) converges to GDP in the limit under composition. We also prove a Berry–Esseen style version of the central limit theorem, which gives a computationally inexpensive tool for tractably analysing the exact composition of private algorithms. Taken together, this collection of attractive properties render f‐DP a mathematically coherent, analytically tractable and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent.
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsitymeaning that the fraction of variables with a non-vanishing effect tends to a constant, however small-this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.
No abstract
We consider high-dimensional sparse regression problems in which we observe y = Xβ + z, where X is an n × p design matrix and z is an n-dimensional vector of independent Gaussian errors, each with variance σ 2 . Our focus is on the recently introduced SLOPE estimator [16], which regularizes the least-squares estimates with the rank-dependent penalty 1≤i≤p λi| β| (i) , where | β| (i) is the ith largest magnitude of the fitted coefficients. Under Gaussian designs, where the entries of X are i.i.d. N (0, 1/n), we show that SLOPE, with weights λi just about equal to σ · Φ −1 (1 − iq/(2p)) (Φ −1 (α) is the αth quantile of a standard normal and q is a fixed number in (0, 1)) achieves a squared error of estimation obeyingas the dimension p increases to ∞, and where > 0 is an arbitrary small constant. This holds under a weak assumption on the 0-sparsity level, namely, k/p → 0 and (k log p)/n → 0, and is sharp in the sense that this is the best possible error any estimator can achieve. A remarkable feature is that SLOPE does not require any knowledge of the degree of sparsity, and yet automatically adapts to yield optimal total squared errors over a wide range of 0-sparsity classes. We are not aware of any other estimator with this property.
Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms-Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method-we study an alternative limiting process that yields high-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov's accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result-that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.
Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms—Nesterov’s accelerated gradient method for strongly convex functions (NAG-) and Polyak’s heavy-ball method—we study an alternative limiting process that yields high-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG- and Polyak’s heavy-ball method, but they allow the identification of a term that we refer to as “gradient correction” that is present in NAG- but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov’s accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result—that NAG- minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG- for smooth convex functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.