The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.
The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.
A major challenge for distinguishing cancer-causing driver mutations from inconsequential passenger mutations is the long-tail of infrequently mutated genes in cancer genomes. Here, we present and evaluate a method for prioritizing cancer genes accounting not only for mutations in individual genes but also in their neighbors in functional networks, MUFFINN (MUtations For Functional Impact on Network Neighbors). This pathway-centric method shows high sensitivity compared with gene-centric analyses of mutation data. Notably, only a marginal decrease in performance is observed when using 10 % of TCGA patient samples, suggesting the method may potentiate cancer genome projects with small patient populations.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-0989-x) contains supplementary material, which is available to authorized users.
Human gene networks have proven useful in many aspects of disease research, with numerous network-based strategies developed for generating hypotheses about gene-disease-drug associations. The ability to predict and organize genes most relevant to a specific disease has proven especially important. We previously developed a human functional gene network, HumanNet, by integrating diverse types of omics data using Bayesian statistics framework and demonstrated its ability to retrieve disease genes. Here, we present HumanNet v2 (http://www.inetbio.org/humannet), a database of human gene networks, which was updated by incorporating new data types, extending data sources and improving network inference algorithms. HumanNet now comprises a hierarchy of human gene networks, allowing for more flexible incorporation of network information into studies. HumanNet performs well in ranking disease-linked gene sets with minimal literature-dependent biases. We observe that incorporating model organisms’ protein–protein interactions does not markedly improve disease gene predictions, suggesting that many of the disease gene associations are now captured directly in human-derived datasets. With an improved interactive user interface for disease network analysis, we expect HumanNet will be a useful resource for network medicine.
Arabidopsis thaliana is a reference plant that has been studied intensively for several decades. Recent advances in high-throughput experimental technology have enabled the generation of an unprecedented amount of data from A. thaliana, which has facilitated data-driven approaches to unravel the genetic organization of plant phenotypes. We previously published a description of a genome-scale functional gene network for A. thaliana, AraNet, which was constructed by integrating multiple co-functional gene networks inferred from diverse data types, and we demonstrated the predictive power of this network for complex phenotypes. More recently, we have observed significant growth in the availability of omics data for A. thaliana as well as improvements in data analysis methods that we anticipate will further enhance the integrated database of co-functional networks. Here, we present an updated co-functional gene network for A. thaliana, AraNet v2 (available at http://www.inetbio.org/aranet), which covers approximately 84% of the coding genome. We demonstrate significant improvements in both genome coverage and accuracy. To enhance the usability of the network, we implemented an AraNet v2 web server, which generates functional predictions for A. thaliana and 27 nonmodel plant species using an orthology-based projection of nonmodel plant genes on the A. thaliana gene network.
Genetic interactions mediate the emergence of phenotype from genotype. The systematic survey of genetic interactions in yeast showed that genes operating in the same biological process have highly correlated genetic interaction profiles, and this observation has been exploited to infer gene function in model organisms. Such assays of digenic perturbations in human cells are also highly informative, but are not scalable, even with CRISPR-mediated methods. As an alternative, we developed an indirect method of deriving functional interactions. We show that genes having correlated knockout fitness profiles across diverse, non-isogenic cell lines are analogous to genes having correlated genetic interaction profiles across isogenic query strains and similarly imply shared biological function. We constructed a network of genes with correlated fitness profiles across 276 high-quality CRISPR knockout screens in cancer cell lines into a “coessentiality network,” with up to 500-fold enrichment for co-functional gene pairs, enabling strong inference of gene function and highlighting the modular organization of the cell.
Background Pooled library CRISPR/Cas9 knockout screening across hundreds of cell lines has identified genes whose disruption leads to fitness defects, a critical step in identifying candidate cancer targets. However, the number of essential genes detected from these monogenic knockout screens is low compared to the number of constitutively expressed genes in a cell. Results Through a systematic analysis of screen data in cancer cell lines generated by the Cancer Dependency Map, we observe that half of all constitutively expressed genes are never detected in any CRISPR screen and that these never-essentials are highly enriched for paralogs. We investigated functional buffering among approximately 400 candidate paralog pairs using CRISPR/enCas12a dual-gene knockout screening in three cell lines. We observe 24 synthetic lethal paralog pairs that have escaped detection by monogenic knockout screens at stringent thresholds. Nineteen of 24 (79%) synthetic lethal interactions are present in at least two out of three cell lines and 14 of 24 (58%) are present in all three cell lines tested, including alternate subunits of stable protein complexes as well as functionally redundant enzymes. Conclusions Together, these observations strongly suggest that functionally redundant paralogs represent a targetable set of genetic dependencies that are systematically under-represented among cell-essential genes in monogenic CRISPR-based loss of function screens.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.