Étienne Roquain scite author profile

We show that the control of the false discovery rate (FDR) for a multiple testing procedure is implied by two coupled simple sufficient conditions. The first one, which we call ``self-consistency condition'', concerns the algorithm itself, and the second, called ``dependency control condition'' is related to the dependency assumptions on the $p$-value family. Many standard multiple testing procedures are self-consistent (e.g. step-up, step-down or step-up-down procedures), and we prove that the dependency control condition can be fulfilled when choosing correspondingly appropriate rejection functions, in three classical types of dependency: independence, positive dependency (PRDS) and unspecified dependency. As a consequence, we recover earlier results through simple and unifying proofs while extending their scope to several regards: weighted FDR, $p$-value reweighting, new family of step-up procedures under unspecified $p$-value dependency and adaptive step-up procedures. We give additional examples of other possible applications. This framework also allows for defining and studying FDR control for multiple testing procedures over a continuous, uncountable space of hypotheses.Comment: Published in at http://dx.doi.org/10.1214/08-EJS180 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

Optimal weighting for false discovery rate control

Roquain¹,

Wiel²

2009

Electron. J. Statist.

View full text Add to dashboard Cite

How to weigh the Benjamini-Hochberg procedure? In the context of multiple hypothesis testing, we propose a new step-wise procedure that controls the false discovery rate (FDR) and we prove it to be more powerful than any weighted Benjamini-Hochberg procedure. Both finite-sample and asymptotic results are presented. Moreover, we illustrate good performance of our procedure in simulations and a genomics application. This work is particularly useful in the case of heterogeneous $p$-value distributions

show abstract

On spike and slab empirical Bayes multiple testing

Castillo¹,

Roquain²

2020

Ann. Statist.

View full text Add to dashboard Cite

Post hoc confidence bounds on false positives using reference families

Blanchard¹,

Neuvial²,

Roquain³

2020

Ann. Statist.

View full text Add to dashboard Cite

We follow a post-hoc, "user-agnostic" approach to false discovery control in a large-scale multiple testing framework, as introduced by Genovese and Wasserman (2006), Goeman and Solari (2011): the statistical guarantee on the number of correct rejections must hold for any set of candidate items, possibly selected by the user after having seen the data. To this end, we introduce a novel point of view based on a family of reference rejection sets and a suitable criterion, namely the joint-family-wise-error rate over that family (JER for short). First, we establish how to derive post hoc bounds from a given JER control and analyze some general properties of this approach. We then develop procedures for controlling the JER in the case where reference regions are p-value level sets. These procedures adapt to dependencies and to the unknown quantity of signal (via a step-down principle). We also show interesting connections to confidence envelopes of Meinshausen (2006); Genovese and Wasserman (2006), the closed testing based approach of Goeman and Solari (2011) and to the higher criticism of Donoho and Jin (2004). Our theoretical statements are supported by numerical experiments.

show abstract

Some nonasymptotic results on resampling in high dimension, I: Confidence regions

Arlot¹,

Blanchard²,

Roquain³

2010

Ann. Statist.

View full text Add to dashboard Cite

We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of interest in their own right. We also discuss the question of accuracy when using Monte Carlo approximations of the resampled quantities.

show abstract

Exact calculations for false discovery proportion with application to least favorable configurations

Roquain¹,

Villers²

2011

Ann. Statist.

View full text Add to dashboard Cite

International audienceIn a context of multiple hypothesis testing, we provide several new exact calculations related to the false discovery proportion (FDP) of step-up and step-down procedures. For step-up procedures, we show that the number of erroneous rejections conditionally on the rejection number is simply a binomial variable, which leads to explicit computations of the c.d.f., the {$s$-th} moment and the mean of the FDP, the latter corresponding to the false discovery rate (FDR). For step-down procedures, we derive what is to our knowledge the first explicit formula for the FDR valid for any alternative c.d.f. of the $p$-values. We also derive explicit computations of the power for both step-up and step-down procedures. These formulas are ``explicit'' in the sense that they only involve the parameters of the model and the c.d.f. of the order statistics of i.i.d. uniform variables. The $p$-values are assumed either independent or coming from an equicorrelated multivariate normal model and an additional mixture model for the true/false hypotheses is used. This new approach is used to investigate new results which are of interest in their own right, related to least/most favorable configurations for the FDR and the variance of the FDP

show abstract

Improved compound Poisson approximation for the number of occurrences of any rare word family in a stationary markov chain

Roquain

Schbath

2007

Adv. Appl. Probab.

View full text Add to dashboard Cite

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.

show abstract

On false discovery rate thresholding for classification under sparsity

Neuvial¹,

Roquain²

2012

Ann. Statist.

View full text Add to dashboard Cite

We study the properties of false discovery rate (FDR) thresholding, viewed as a classification procedure. The "0"-class (null) is assumed to have a known density while the "1"-class (alternative) is obtained from the "0"-class either by translation or by scaling. Furthermore, the "1"-class is assumed to have a small number of elements w.r.t. the "0"-class (sparsity). We focus on densities of the Subbotin family, including Gaussian and Laplace models. Nonasymptotic oracle inequalities are derived for the excess risk of FDR thresholding. These inequalities lead to explicit rates of convergence of the excess risk to zero, as the number m of items to be classified tends to infinity and in a regime where the power of the Bayes rule is away from 0 and 1. Moreover, these theoretical investigations suggest an explicit choice for the target level $\alpha_m$ of FDR thresholding, as a function of m. Our oracle inequalities show theoretically that the resulting FDR thresholding adapts to the unknown sparsity regime contained in the data. This property is illustrated with numerical experiments

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.