A clarifying comparison of methods for controlling the false discovery rate

Yin, Yaling; Soteros, C E; Bickis, Mikelis G.

doi:10.1016/j.jspi.2008.10.010

Cited by 4 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, when π 0 is large, the empirical FDR for BH remains at level 0.04 regardless of experimental parameters. Similarly using simulations, Yin et al (2009) showed that BH rejects at least as many hypotheses as ST when π 0 is sufficiently close to 1. On the contrary, BL and BY are very conservative.…”

Section: Discussionmentioning

confidence: 82%

See 1 more Smart Citation

Evaluations of FDR-controlling procedures in multiple hypothesis testing

Hwang

Chu

2010

Stat Comput

View full text Add to dashboard Cite

Many exploratory experiments such as DNA microarray or brain imaging require simultaneously comparisons of hundreds or thousands of hypotheses. Under such a setting, using the false discovery rate (FDR) as an overall Type I error is recommended (Benjamini and Hochberg in J. R. Stat. Soc. B 57:289-300, 1995). Many FDR controlling procedures have been proposed. However, when evaluating the performance of FDR-controlling procedures, researchers are often focused on the ability of procedures to control the FDR and to achieve high power. Meanwhile, under the multiple hypotheses, it may be also likely to commit a false nondiscovery or fail to claim a true non-significance. In addition, various experimental parameters such as the number of hypotheses, the proportion of the number of true null hypotheses to the number of hypotheses, the samples size and the correlation structure may affect the performance of FDR controlling procedures. The purpose of this paper is to illustrate the performance of some existing FDR controlling procedures in terms of four indices, i.e., the FDR, the false nondiscovery rate, the sensitivity and the specificity. Analytical results of these indices for the FDR controlling procedures are derived. Simulations are also performed to evaluate the performance of controlling procedures in terms of these indices under various experimental parameters. The result can be used to summarize as a guidance for practitioners to properly choose a FDR controlling procedure.

show abstract

Section: Discussionmentioning

confidence: 82%

“…Nevertheless, this estimator is not sensitive to the value of λ (Yin et al 2009). Yin et al suggested using λ = 0.5.…”

Section: Computementioning

confidence: 98%

Evaluations of FDR-controlling procedures in multiple hypothesis testing

Hwang

Chu

2010

Stat Comput

View full text Add to dashboard Cite

show abstract

“…If the data contain many nucleosomes it may be beneficial to modify this procedure to account for the estimated proportion of true null hypotheses. This is done using the αAFDR approach discussed in [19], where it is shown to be equivalent to the procedure proposed by Storey et al . [20].…”

Section: Resultsmentioning

confidence: 99%

Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks

et al. 2011

View full text Add to dashboard Cite

BackgroundRecently, revealing the function of proteins with protein-protein interaction (PPI) networks is regarded as one of important issues in bioinformatics. With the development of experimental methods such as the yeast two-hybrid method, the data of protein interaction have been increasing extremely. Many databases dealing with these data comprehensively have been constructed and applied to analyzing PPI networks. However, few research on prediction interaction sites using both PPI networks and the 3D protein structures complementarily has explored.ResultsWe propose a method of predicting interaction sites in proteins with unknown function by using both of PPI networks and protein structures. For a protein with unknown function as a target, several clusters are extracted from the neighboring proteins based on their structural similarity. Then, interaction sites are predicted by extracting similar sites from the group of a protein cluster and the target protein. Moreover, the proposed method can improve the prediction accuracy by introducing repetitive prediction process.ConclusionsThe proposed method has been applied to small scale dataset, then the effectiveness of the method has been confirmed. The challenge will now be to apply the method to large-scale datasets.

show abstract

“…The q-value of the jth test a p-value p (j) is the FDR if we use p (j) as the cutoff t in feature selection; i.e., features with p-values �p (j) are selected. To ensure theoretical monotonicity, q(p (j) ) is defined as the minimum of FDR(t) for t � p (j) [25]: If we order all the p-values, and denote the jth p-value by p [j] , an approximation for the qvalue isqðp ½j� Þ ¼ m �p 0 � p ½j� =j, which may not be monotone with p [j] , but is easy to calculate and typically close to q(p [j] ) when m is large; see a discussion of FDR by Yin et al [26].…”

Section: Plos Onementioning

confidence: 99%

Predictive analysis methods for human microbiome data with application to Parkinson’s disease

Dong

Chen

et al. 2020

PLoS ONE

View full text Add to dashboard Cite

Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/qvalues depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson's disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson's disease can be predicted based on selected OTUs after the binary transformation, age, and sex with high accuracy (Error Rate = 0.199, AUC = 0.872, AUPRC = 0.912). These results provide strong evidences of the relationship between Parkinson's disease and the gut microbiome.

show abstract

A clarifying comparison of methods for controlling the false discovery rate

Cited by 4 publications

References 14 publications

Evaluations of FDR-controlling procedures in multiple hypothesis testing

Evaluations of FDR-controlling procedures in multiple hypothesis testing

Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks

Predictive analysis methods for human microbiome data with application to Parkinson’s disease

Contact Info

Product

Resources

About