The generation of new ideas and scientific hypotheses is often the result of extensive literature and database searches, but, with the growing wealth of public and private knowledge, the process of searching diverse and interconnected data to generate new insights into genes, gene networks, traits and diseases is becoming both more complex and more time-consuming. To guide this technically challenging data integration task and to make gene discovery and hypotheses generation easier for researchers, we have developed a comprehensive software package called KnetMiner which is open-source and containerized for easy use. KnetMiner is an integrated, intelligent, interactive gene and gene network discovery platform that supports scientists explore and understand the biological stories of complex traits and diseases across species. It features fast algorithms for generating rich interactive gene networks and prioritizing candidate genes based on knowledge mining approaches. KnetMiner is used in many plant science institutions and has been adopted by several plant breeding organizations to accelerate gene discovery. The software is generic and customizable and can therefore be readily applied to new species and data types; for example, it has been applied to pest insects and fungal pathogens; and most recently repurposed to support COVID-19 research. Here, we give an overview of the main approaches behind KnetMiner and we report plant-centric case studies for identifying genes, gene networks and trait relationships in Triticum aestivum (bread wheat), as well as, an evidence-based approach to rank candidate genes under a large Arabidopsis thaliana QTL.
Gene regulatory networks are powerful tools which facilitate hypothesis generation and candidate gene discovery. However, the extent to which the network predictions are biologically relevant is often unclear. Recently a GENIE3 network which predicted targets of wheat transcription factors was produced. Here we used an independent RNA-Seq dataset to test the predictions of the wheat GENIE3 network for the senescence-regulating transcription factor NAM-A1 (TraesCS6A02G108300). We re-analysed the RNA-Seq data against the RefSeqv1.0 genome and identified a set of differentially expressed genes (DEGs) between the wild-type and nam-a1 mutant which recapitulated the known role of NAM-A1 in senescence and nutrient remobilisation. We found that the GENIE3-predicted target genes of NAM-A1 overlap significantly with the DEGs, more than would be expected by chance. Based on high levels of overlap between GENIE3-predicted target genes and the DEGs, we identified candidate senescence regulators. We then explored genome-wide trends in the network related to polyploidy and found that only homoeologous transcription factors are likely to share predicted targets in common. However, homoeologs which vary in expression levels across tissues are less likely to share predicted targets than those that do not, suggesting that they may be more likely to act in distinct pathways. This work demonstrates that the wheat GENIE3 network can provide biologically-relevant predictions of transcription factor targets, which can be used for candidate gene prediction and for global analyses of transcription factor function. The GENIE3 network has now been integrated into the KnetMiner web application, facilitating its use in future studies.
12Generating new ideas and scientific hypotheses is often the result of extensive literature and 13 database reviews, overlaid with scientists' own novel data and a creative process of making 14 connections that were not made before. We have developed a comprehensive approach to guide 15 this technically challenging data integration task and to make knowledge discovery and 16 hypotheses generation easier for plant and crop researchers. KnetMiner can digest large volumes 17 of scientific literature and biological research to find and visualise links between the genetic and 18 biological properties of complex traits and diseases. Here we report the main design principles 19 behind KnetMiner and provide use cases for mining public datasets to identify unknown links 20 between traits such grain colour and pre-harvest sprouting in Triticum aestivum, as well as, an 21 evidence-based approach to identify candidate genes under an Arabidopsis thaliana petal size 22 QTL. We have developed KnetMiner knowledge graphs and applications for a range of species 23 including plants, crops and pathogens. KnetMiner is the first open-source gene discovery platform 24 that can leverage genome-scale knowledge graphs, generate evidence-based biological networks 25 and be deployed for any species with a sequenced genome. KnetMiner is available at 26 http://knetminer.org. 27 2 KEYWORDS 28 knowledge graph, interactive knowledge discovery, exploratory data mining, omics data 29 integration, candidate gene prioritization, information visualisation, systems biology 30 31
The speed and accuracy of new scientific discoveries - be it by humans or artificial intelligence - depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).
Dictionaries define Agility as quick-moving, nimble, active and so on. In the context of the corporate world agility signifies an adaptable, nimble-footed organization re-engineering and refining its operations based on market compulsions to fulfill its strategic goals and objectives. While conventional tools and methodologies are still relevant and form a significant part of the strategic arsenal, newer and evolving avenues such as Social Media offer increasingly proactive and insightful approaches. As businesses strive to stay competitive and customer-centric in their endeavors, Social Media provides a new dimension towards meeting this end through a myriad of approaches, which are examined in this paper.
13Gene regulatory networks are powerful tools which facilitate hypothesis generation and candidate 14 gene discovery. However, the extent to which the network predictions are biologically relevant is 15 often unclear. Recently, as part of an analysis of the RefSeqv1.0 wheat transcriptome, a GENIE3 16network which predicted targets of wheat transcription factors was produced. Here we have used an 17 independent and publicly-available RNA-Seq dataset to validate the predictions of the wheat GENIE3 18 network for the senescence-regulating transcription factor NAM-A1 (TraesCS6A02G108300). We re-19analysed the RNA-Seq data against the RefSeqv1.0 genome and identified a de novo set of 20 differentially expressed genes (DEGs) between the wild-type and nam-a1 mutant which recapitulated 21 the known role of NAM-A1 in senescence and nutrient remobilisation. We found that the GENIE3-22predicted target genes of NAM-A1 overlap significantly with the de novo DEGs, more than would be 23 expected for a random transcription factor. Based on high levels of overlap between GENIE3-24 predicted target genes and the de novo DEGs, we also identified a set of candidate senescence 25 regulators. We then explored genome-wide trends in the network related to polyploidy and 26 homoeolog expression levels and found that only homoeologous transcription factors are likely to 27 share predicted targets in common. However, homoeologs in dynamic triads, i.e. with higher variation 28 in homoeolog expression levels across tissues, are less likely to share predicted targets than stable 29triads. This suggests that homoeologs in dynamic triads are more likely to act on distinct pathways. 30This work demonstrates that the wheat GENIE3 network can provide biologically-relevant predictions 31 of transcription factor targets, which can be used for candidate gene prediction and for global analyses 32of transcription factor function. The GENIE3 network has now been integrated into the KnetMiner 33 web application, facilitating its use in future studies. 34 35
Enormous volumes of COVID-19 research data have been published and this continues to increase daily. This creates challenges for researchers to interpret, prioritize and summarize their own findings in the context of published literature, clinical trials, and a multitude of databases. Overcoming the data interpretation bottleneck is vital to help researchers to be more efficient in their quest to identify COVID-19 risk factors, potential treatments, drug side-effects, and much more. As a proof of concept, we have organized and integrated a range of COVID-19 and human biomedical data and literature into a knowledge graph (KG). Here we present the datasets we have integrated so far and the content of the KG which consists of 674,969 biological concepts and over 1.6 million relationships between them. The COVID-19 KG is available via KnetMiner, an interactive online platform for gene discovery and knowledge mining, or via RDF and Neo4j graph formats which can be searched programmatically through SPARQL and Cypher endpoints. KnetMiner is a road mapped ELIXIR UK service. We hope this integrated resource will enable faster data interpretation and discovery of linkages between genes, drugs, diseases and many more types of information relating to COVID-19.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.