We consider multiple testing with false discovery rate (FDR) control when p values have discrete and heterogeneous null distributions. We propose a new estimator of the proportion of true null hypotheses and demonstrate that it is less upwardly biased than Storey's estimator and two other estimators. The new estimator induces two adaptive procedures, that is, an adaptive Benjamini-Hochberg (BH) procedure and an adaptive Benjamini-Hochberg-Heyse (BHH) procedure. We prove that the adaptive BH (aBH) procedure is conservative nonasymptotically. Through simulation studies, we show that these procedures are usually more powerful than their nonadaptive counterparts and that the adaptive BHH procedure is usually more powerful than the aBH procedure and a procedure based on randomized p-value. The adaptive procedures are applied to a study of HIV vaccine efficacy, where they identify more differentially polymorphic positions than the BH procedure at the same FDR level.
The objective of this foundational study was to develop and evaluate the efficacy of an affordable hyperspectral imaging (HSI) system to identify single and mixed strains of foodborne pathogens in dairy products. This study was designed as a completely randomized design with three replications. Three strains each of Escherichia coli O157:H7 and Listeria monocytogenes were evaluated either as single or mixed strains with the HSI system in growth media and selected dairy products (whole milk, and cottage and cheddar cheeses). Test samples from freshly prepared single or mixed strains of pathogens in growth media or inoculated dairy products were streaked onto selective media (PALCAM and/or Sorbitol MacConkey agar) for isolation. An isolated colony was selected and mixed with 1 ml of HPLC grade water, vortexed for 1 min, and spread over a microscope slide. Images were captured at 2000× magnification on the built HSI system at wavelengths ranging from 400 nm to 1100 nm with 5‐nm band intervals. For each image, three cells were selected as regions of interest (ROIs) to obtain hyperspectral signatures of respective bacteria. Reference pathogen libraries were created using growth media, and then test pathogenic cells were classified by their hyperspectral signatures as either L. monocytogenes or E. coli O157:H7 using k‐nearest neighbor (kNN) and cross‐validation technique in R‐software. With the implementation of kNN (k = 3), overall classification accuracies of 58.97% and 61.53% were obtained for E. coli O157:H7 and L. monocytogenes, respectively.
Summary
The false discovery rate (FDR) measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the FDR. We develop a new framework for formulating and estimating FDRs and q-values when an additional piece of information, which we call an “informative variable”, is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The FDR is then treated as a function of this informative variable. We consider two applications in genomics. Our first application is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.
For multiple testing based on discrete p‐values, we propose a false discovery rate (FDR) procedure “BH+” with proven conservativeness. BH+ is at least as powerful as the BH (i.e., Benjamini‐Hochberg) procedure when they are applied to superuniform p‐values. Further, when applied to mid‐p‐values, BH+ can be more powerful than it is applied to conventional p‐values. An easily verifiable necessary and sufficient condition for this is provided. BH+ is perhaps the first conservative FDR procedure applicable to mid‐p‐values and to p‐values with general distributions. It is applied to multiple testing based on discrete p‐values in a methylation study, an HIV study and a clinical safety study, where it makes considerably more discoveries than the BH procedure. In addition, we propose an adaptive version of the BH+ procedure, prove its conservativeness under certain conditions, and provide evidence on its excellent performance via simulation studies.
Multiple testing with false discovery rate (FDR) control has been widely conducted in the "discrete paradigm" where p-values have discrete and heterogeneous null distributions with finitely many discontinuities. However, existing FDR procedures may lose some power when applied to such p-values. We propose a weighted FDR procedure for multiple testing in the discrete paradigm that efficiently adapts to both the heterogeneity and discreteness of p-value distributions. We prove the conservativeness of the weighted FDR procedure and demonstrate that it is more powerful than several other procedures for multiple testing based on p-values of binomial test or Fisher's exact test. The weighted FDR procedure is applied to a drug safety study and a differential methylation study based on discrete data, where it makes more discoveries than the Benjamini-Hochberg procedure at the same FDR level.
The false discovery rate measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics.In some scenarios, there is additional information available that may be used to more accurately estimate the false discovery rate. We develop a new framework for formulating and estimating false discovery rates and q-values when an additional piece of information, which we call an "informative variable", is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The false discovery rate is then treated as a function of this informative variable. We consider two applications in genomics.Our first is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.