The false discovery rate (FDR) is a multiple hypothesis testing quantity that describes the expected proportion of false positive results among all rejected null hypotheses. Benjamini and Hochberg introduced this quantity and proved that a particular step-up "p"-value method controls the FDR. Storey introduced a point estimate of the FDR for fixed significance regions. The former approach conservatively controls the FDR at a fixed predetermined level, and the latter provides a conservatively biased estimate of the FDR for a fixed predetermined significance region. In this work, we show in both finite sample and asymptotic settings that the goals of the two approaches are essentially equivalent. In particular, the FDR point estimates can be used to define valid FDR controlling procedures. In the asymptotic setting, we also show that the point estimates can be used to estimate the FDR conservatively over all significance regions simultaneously, which is equivalent to controlling the FDR at all levels simultaneously. The main tool that we use is to translate existing FDR methods into procedures involving empirical processes. This simplifies finite sample proofs, provides a framework for asymptotic results and proves that these procedures are valid even under certain forms of dependence. Copyright 2004 Royal Statistical Society.
In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.
We develop a mixture procedure to monitor parallel streams of data for a
change-point that affects only a subset of them, without assuming a spatial
structure relating the data streams to one another. Observations are assumed
initially to be independent standard normal random variables. After a
change-point the observations in a subset of the streams of data have nonzero
mean values. The subset and the post-change means are unknown. The procedure we
study uses stream specific generalized likelihood ratio statistics, which are
combined to form an overall detection statistic in a mixture model that
hypothesizes an assumed fraction $p_0$ of affected data streams. An analytic
expression is obtained for the average run length (ARL) when there is no change
and is shown by simulations to be very accurate. Similarly, an approximation
for the expected detection delay (EDD) after a change-point is also obtained.
Numerical examples are given to compare the suggested procedure to other
procedures for unstructured problems and in one case where the problem is
assumed to have a well-defined geometric structure. Finally we discuss
sensitivity of the procedure to the assumed value of $p_0$ and suggest a
generalization.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1094 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.