Transcription factors (TFs) are major trans-acting factors in transcriptional regulation. Therefore, elucidating TF–target interactions is a key step toward understanding the regulatory circuitry underlying complex traits such as human diseases. We previously published a reference TF–target interaction database for humans—TRRUST (Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining)—which was constructed using sentence-based text mining, followed by manual curation. Here, we present TRRUST v2 (www.grnpedia.org/trrust) with a significant improvement from the previous version, including a significantly increased size of the database consisting of 8444 regulatory interactions for 800 TFs in humans. More importantly, TRRUST v2 also contains a database for TF–target interactions in mice, including 6552 TF–target interactions for 828 mouse TFs. TRRUST v2 is also substantially more comprehensive and less biased than other TF–target interaction databases. We also improved the web interface, which now enables prioritization of key TFs for a physiological condition depicted by a set of user-input transcriptional responsive genes. With the significant expansion in the database size and inclusion of the new web tool for TF prioritization, we believe that TRRUST v2 will be a versatile database for the study of the transcriptional regulation involved in human diseases.
Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
A conceptual framework for integrating diverse functional genomics data was developed by reinterpreting experiments to provide numerical likelihoods that genes are functionally linked. This allows direct comparison and integration of different classes of data. The resulting probabilistic gene network estimates the functional coupling between genes. Within this framework, we reconstructed an extensive, high-quality functional gene network for Saccharomyces cerevisiae, consisting of 4681 (approximately 81%) of the known yeast genes linked by approximately 34,000 probabilistic linkages comparable in accuracy to small-scale interaction assays. The integrated linkages distinguish true from false-positive interactions in earlier data sets; new interactions emerge from genes' network contexts, as shown for genes in chromatin modification and ribosome biogenesis.
Network “guilt by association” (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. In principle, this approach could account even for nonadditive genetic interactions, which underlie the synergistic combinations of mutations often linked to complex diseases. Here, we analyze a large-scale, human gene functional interaction network (dubbed HumanNet). We show that candidate disease genes can be effectively identified by GBA in cross-validated tests using label propagation algorithms related to Google's PageRank. However, GBA has been shown to work poorly in genome-wide association studies (GWAS), where many genes are somewhat implicated, but few are known with very high certainty. Here, we resolve this by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. We observe a significant boost in the power to detect validated candidate genes for Crohn's disease and type 2 diabetes by comparing our predictions to results from follow-up meta-analyses, with incorporation of the network serving to highlight the JAK–STAT pathway and associated adaptors GRB2/SHC1 in Crohn's disease and BACH2 in type 2 diabetes. Consideration of the network during GWAS thus conveys some of the benefits of enrolling more participants in the GWAS study. More generally, we demonstrate that a functional network of human genes provides a valuable statistical framework for prioritizing candidate disease genes, both for candidate gene-based and GWAS-based studies.
The fundamental aim of genetics is to understand how an organism's phenotype is determined by its genotype, and implicit in this is predicting how changes in DNA sequence alter phenotypes. A single network covering all the genes of an organism might guide such predictions down to the level of individual cells and tissues. To validate this approach, we computationally generated a network covering most C. elegans genes and tested its predictive capacity. Connectivity within this network predicts essentiality, identifying this relationship as an evolutionarily conserved biological principle. Critically, the network makes tissue-specific predictions-we accurately identify genes for most systematically assayed loss-of-function phenotypes, which span diverse cellular and developmental processes. Using the network, we identify 16 genes whose inactivation suppresses defects in the retinoblastoma tumor suppressor pathway, and we successfully predict that the dystrophin complex modulates EGF signaling. We conclude that an analogous network for human genes might be similarly predictive and thus facilitate identification of disease genes and rational therapeutic targets.
The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.
Plants are essential sources of food, fiber and renewable energy. Effective methods for manipulating plant traits have important agricultural and economic consequences. We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional network and targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647 (73%) genes of the reference flowering plant Arabidopsis thaliana. AraNet associations have measured precision greater than literature-based protein interactions (21%) for 55% of genes, and are highly predictive for diverse biological pathways. Using AraNet, we found a 10-fold enrichment in identifying early seedling development genes. By interrogating network neighborhoods, we identify At1g80710 (now Drought sensitive 1; Drs1) and At3g05090 (now Lateral root stimulator 1; Lrs1) as novel regulators of drought sensitivity and lateral root development, respectively. AraNet (http://www.functionalnet.org/aranet/) provides a global resource for plant gene function identification and genetic dissection of plant traits.What is the best approach for identifying genes for important plant traits? Forward genetics is limited as mutations in many genes may render only moderate or weak phenotypes. Similarly, while reverse genetics allows for directed assay of gene perturbations 1 , saturated phenotyping for many plant traits is impractical. A pragmatic near-term solution is the computational identification of likely candidate genes for desired traits, allowing for focused, efficient use of reverse genetics. This solution is not unlike rational drug design in which computer-assisted and expert knowledge are combined with targeted screening for the desired drug, or in our case, trait.
BackgroundIdentifying all protein complexes in an organism is a major goal of systems biology. In the past 18 months, the results of two genome-scale tandem affinity purification-mass spectrometry (TAP-MS) assays in yeast have been published, along with corresponding complex maps. For most complexes, the published data sets were surprisingly uncorrelated. It is therefore useful to consider the raw data from each study and generate an accurate complex map from a high-confidence data set that integrates the results of these and earlier assays.ResultsUsing an unsupervised probabilistic scoring scheme, we assigned a confidence score to each interaction in the matrix-model interpretation of the large-scale yeast mass-spectrometry data sets. The scoring metric proved more accurate than the filtering schemes used in the original data sets. We then took a high-confidence subset of these interactions and derived a set of complexes using MCL. The complexes show high correlation with existing annotations. Hierarchical organization of some protein complexes is evident from inter-complex interactions.ConclusionWe demonstrate that our scoring method can generate an integrated high-confidence subset of observed matrix-model interactions, which we subsequently used to derive an accurate map of yeast complexes. Our results indicate that essentiality is a product of the protein complex rather than the individual protein, and that we have achieved near saturation of the yeast high-abundance, rich-media-expressed "complex-ome."
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.