Abstract:BackgroundIncreasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway — metabolic pathways — has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence a… Show more
“…Lastly, Relation Extraction (RE) is a task for extracting pre-defined facts relating to an entity or entities in the text [29]. In biomedical domain, multiple RE methods have been developed to extract information relating to genes [16], such as Mutation-Disease associations, protein-protein interaction [30,31], pathway curation [32], gene methylation and cancer relation [33], biomolecular events [34], metabolic reactions [35] and gene-gene interactions [36]. For gene regulatory networks, which is the focus of this paper, the RE sys-tem must detect and extract a causal relation between a protein and a gene (e.g., A regulated B).…”
A B S T R A C T Background: Transcription factors (TFs) are proteins that are fundamental to transcription and regulation of gene expression. Each TF may regulate multiple genes and each gene may be regulated by multiple TFs. TFs can act as either activator or repressor of gene expression. This complex network of interactions between TFs and genes underlies many developmental and biological processes and is implicated in several human diseases such as cancer. Hence deciphering the network of TFgene interactions with information on mode of regulation (activation vs. repression) is an important step toward understanding the regulatory pathways that underlie complex traits. There are many experimental, computational, and manually curated databases of TF-gene interactions. In particular, high-throughput ChIP-Seq datasets provide a large-scale map or transcriptional regulatory interactions. However, these interactions are not annotated with information on context and mode of regulation. Such information is crucial to gain a global picture of gene regulatory mechanisms and can aid in developing machine learning models for applications such as biomarker discovery, prediction of response to therapy, and precision medicine. Methods: In this work, we introduce a text-mining system to annotate ChIP-Seq derived interaction with such meta data through mining PubMed articles. We evaluate the performance of our system using gold standard small scale manually curated databases. Results: Our results show that the method is able to accurately extract mode of regulation with F-score 0.77 on TRRUST curated interaction and F-score 0.96 on intersection of TRUSST and ChIP-network. We provide a HTTP REST API for our code to facilitate usage. Availibility: Source code and datasets are available for download on GitHub: https:
“…Lastly, Relation Extraction (RE) is a task for extracting pre-defined facts relating to an entity or entities in the text [29]. In biomedical domain, multiple RE methods have been developed to extract information relating to genes [16], such as Mutation-Disease associations, protein-protein interaction [30,31], pathway curation [32], gene methylation and cancer relation [33], biomolecular events [34], metabolic reactions [35] and gene-gene interactions [36]. For gene regulatory networks, which is the focus of this paper, the RE sys-tem must detect and extract a causal relation between a protein and a gene (e.g., A regulated B).…”
A B S T R A C T Background: Transcription factors (TFs) are proteins that are fundamental to transcription and regulation of gene expression. Each TF may regulate multiple genes and each gene may be regulated by multiple TFs. TFs can act as either activator or repressor of gene expression. This complex network of interactions between TFs and genes underlies many developmental and biological processes and is implicated in several human diseases such as cancer. Hence deciphering the network of TFgene interactions with information on mode of regulation (activation vs. repression) is an important step toward understanding the regulatory pathways that underlie complex traits. There are many experimental, computational, and manually curated databases of TF-gene interactions. In particular, high-throughput ChIP-Seq datasets provide a large-scale map or transcriptional regulatory interactions. However, these interactions are not annotated with information on context and mode of regulation. Such information is crucial to gain a global picture of gene regulatory mechanisms and can aid in developing machine learning models for applications such as biomarker discovery, prediction of response to therapy, and precision medicine. Methods: In this work, we introduce a text-mining system to annotate ChIP-Seq derived interaction with such meta data through mining PubMed articles. We evaluate the performance of our system using gold standard small scale manually curated databases. Results: Our results show that the method is able to accurately extract mode of regulation with F-score 0.77 on TRRUST curated interaction and F-score 0.96 on intersection of TRUSST and ChIP-network. We provide a HTTP REST API for our code to facilitate usage. Availibility: Source code and datasets are available for download on GitHub: https:
“…Knowledge discovery uses techniques from a wide range of disciplines such as artificial intelligence, machine learning, pattern recognition, data mining, and statistics [45]. Both information extraction and knowledge discovery find their application in database curation [46], [47] and pathway construction [48], [49].…”
Abstract-The rapid growth of biomedical informatics has drawn increasing popularity and attention. The reason behind this are the advances in genomic, new molecular, biomedical approaches and various applications like protein identification, patient medical records, genome sequencing, medical imaging and a huge set of biomedical research data are being generated day to day. The increase of biomedical data consists of both structured and unstructured data. Subsequently, in a traditional database system (structured data), managing and extracting useful information from unstructured-biomedical data is a tedious job. Hence, mechanisms, tools, processes, and methods are necessary to apply on unstructured biomedical data (text) to get the useful business data. The fast development of these accumulations makes it progressively troublesome for people to get to the required information in an advantageous and viable way. Text mining can help us mine information and knowledge from a mountain of text, and is now widely applied in biomedical research. Text mining is not a new technology, but it has recently received spotlight attention due to the emergence of Big Data. The applications of text mining are diverse and span to multiple disciplines, ranging from biomedicine to legal, business intelligence and security. In this survey paper, the researcher identifies and discusses biomedical data (text) mining issues, and recommends a possible technique to cope with possible future growth.
“…The NutriChem database [62] has been developed using a similar approach to find plant and diet related compounds from PubMed. Metabolomics text mining was used to extract information on all literature-known compounds in yeast [63], and to complement pathway reconstructions through reports on product/substrate pairs [64]. However, there has been little progress in using automated text mining approaches for the complement of metabolomics data sets, with the sole exception of PolySearch [65] ●● .…”
Access to high quality metabolomics data has become a routine component for biological studies. However, interpreting those datasets in biological contexts remains a challenge, especially because many identified metabolites are not found in biochemical pathway databases. Starting from statistical analyses, a range of new tools are available, including metabolite set enrichment analysis, pathway and network visualization, pathway prediction, biochemical databases and text mining. Integrating these approaches into comprehensive and unbiased interpretations must carefully consider both caveats of the metabolomics dataset itself as well as the structure and properties of the biological study design. Special considerations need to be taken when adopting approaches from genomics for use in metabolomics. R and Python programming language are enabling an easier exchange of diverse tools to deploy integrated workflows. This review summarizes the key ideas and latest developments in regards to these approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.