Abstract-Several studies show that biological knowledge is growing at a continuous rate and distributed among different databases, making the process of data integration a hard task to perform, because they have different structures, different ways of storing data and also different approaches to export information, and are usually developed to provide information for a specific organism. Due to the large amount of biological data, the process of data integration has been one of the major challenges in the field of bioinformatics as well as discovering information about Transcriptional Regulatory Networks (TRN). When using a single source, this task is not easy to perform since the source often lacks enough information for the successful completion of the task. Therefore it is necessary to find information in several databases in order to create a useful body of knowledge. This work presents a new approach of integrating data related with TRNs for the Escherichia coli by creating a new integrated data repository gathering information from KEGG, EcoCyc, Regulon and NCBI databases.
Transcriptional Regulatory Networks (TRNs) are powerful tool for representing several interactions that occur within a cell. Recent studies have provided information to help researchers in the tasks of building and understanding these networks. One of the major sources of information to build TRNs is biomedical literature. However, due to the rapidly increasing number of scientific papers, it is quite difficult to analyse the large amount of papers that have been published about this subject. This fact has heightened the importance of Biomedical Text Mining approaches in this task. Also, owing to the lack of adequate standards, as the number of databases increases, several inconsistencies concerning gene and protein names and identifiers are common. In this work, we developed an integrated approach for the reconstruction of TRNs that retrieve the relevant information from important biological databases and insert it into a unique repository, named KREN. Also, we applied text mining techniques over this integrated repository to build TRNs. However, was necessary to create a dictionary of names and synonyms associated with these entities and also develop an approach that retrieves all the abstracts from the related scientific papers stored on PubMed, in order to create a corpora of data about genes. Furthermore, these tasks were integrated into @Note, a software system that allows to use some methods from the Biomedical Text Mining field, including an algorithms for Named Entity Recognition (NER), extraction of all relevant terms from publication abstracts, extraction relationships between biological entities (genes, proteins and transcription factors). And finally, extended this tool to allow the reconstruction Transcriptional Regulatory Networks through using scientific literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.