Xinzhou Ge scite author profile

When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.

show abstract

DreamAI: algorithm for the imputation of proteomics data

Kim

Chowdhury

et al. 2020

Preprint

View full text Add to dashboard Cite

Deep proteomics profiling using labelled LC-MS/MS experiments has been proven to be powerful to study complex diseases. However, due to the dynamic nature of the discovery mass spectrometry, the generated data contain a substantial fraction of missing values. This poses great challenges for data analyses, as many tools, especially those for high dimensional data, cannot deal with missing values directly. To address this problem, the NCI-CPTAC Proteogenomics DREAM Challenge was carried out to develop effective imputation algorithms for labelled LC-MS/MS proteomics data through crowd learning. The final resulting algorithm, DreamAI, is based on an ensemble of six different imputation methods. The imputation accuracy of DreamAI, as measured by correlation, is about 15%-50% greater than existing tools among less abundant proteins, which are more vulnerable to be missed in proteomics data sets. This new tool nicely enhances data analysis capabilities in proteomics research.

show abstract

DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features

Lyu

et al. 2020

Sci. Adv.

View full text Add to dashboard Cite

Data-driven discovery of cancer driver genes, including tumor suppressor genes (TSGs) and oncogenes (OGs), is imperative for cancer prevention, diagnosis, and treatment. Although epigenetic alterations are important for tumor initiation and progression, most known driver genes were identified based on genetic alterations alone. Here, we developed an algorithm, DORGE (Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features), to identify TSGs and OGs by integrating comprehensive genetic and epigenetic data. DORGE identified histone modifications as strong predictors for TSGs, and it found missense mutations, super enhancers, and methylation differences as strong predictors for OGs. We extensively validated DORGE-predicted cancer driver genes using independent functional genomics data. We also found that DORGE-predicted dual-functional genes (both TSGs and OGs) are enriched at hubs in protein-protein interaction and drug-gene networks. Overall, our study has deepened the understanding of epigenetic mechanisms in tumorigenesis and revealed previously undetected cancer driver genes.

show abstract

DORGE: Discovery of Oncogenes and Tumor SuppressoR Genes Using Genetic and Epigenetic Features

Lyu

et al. 2020

Preprint

View full text Add to dashboard Cite

Comprehensive data-driven discovery of cancer driver genes, including tumor suppressor genes (TSGs) and oncogenes (OGs), is imperative for cancer prevention, diagnosis, and treatment. Although epigenetic alterations are important contributors to tumor initiation and progression, most known driver genes were identified based on genetic alterations alone, and it remains unclear to what the extent epigenetic features would facilitate the identification and characterization of cancer driver genes. Here we developed a prediction algorithm DORGE (Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features), which integrates the most comprehensive collection of tumor genetic and epigenetic data to identify TSGs and OGs, particularly those with rare mutations. DORGE identified histone modifications as strong predictors for TSGs, and it found missense mutations, super enhancer percentages, and methylation differences between cancer and normal samples as strong predictors for OGs. We extensively validated novel cancer driver genes predicted by DORGE using independent functional genomics data. We also found that the dual-functional genes, which are both TSGs and OGs predicted by DORGE, are enriched at hubs in protein-protein interaction and drug-gene networks. Overall, our study has deepened the understanding of epigenetic mechanisms in tumorigenesis and revealed a previously undetected repertoire of cancer driver genes.

show abstract

Clipper: p-value-free FDR control on high-throughput data from two conditions

Chen

Song

et al. 2021

Genome Biol

View full text Add to dashboard Cite

High-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xinzhou Ge

Exaggerated false positives by popular differential expression methods when analyzing human population samples

DreamAI: algorithm for the imputation of proteomics data

DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features

DORGE: Discovery of Oncogenes and Tumor SuppressoR Genes Using Genetic and Epigenetic Features

Clipper: p-value-free FDR control on high-throughput data from two conditions

Contact Info

Product

Resources

About