Genome-scale metabolic models have been recognised as useful tools for better understanding living organisms’ metabolism. merlin (https://www.merlin-sysbio.org/) is an open-source and user-friendly resource that hastens the models’ reconstruction process, conjugating manual and automatic procedures, while leveraging the user's expertise with a curation-oriented graphical interface. An updated and redesigned version of merlin is herein presented. Since 2015, several features have been implemented in merlin, along with deep changes in the software architecture, operational flow, and graphical interface. The current version (4.0) includes the implementation of novel algorithms and third-party tools for genome functional annotation, draft assembly, model refinement, and curation. Such updates increased the user base, resulting in multiple published works, including genome metabolic (re-)annotations and model reconstructions of multiple (lower and higher) eukaryotes and prokaryotes. merlin version 4.0 is the only tool able to perform template based and de novo draft reconstructions, while achieving competitive performance compared to state-of-the art tools both for well and less-studied organisms.
Genome-scale metabolic models have been recognized as useful tools for better understanding living organism's metabolism. Merlin (https://merlin-sysbio.org/) is an open-source and user-friendly resource that hastens these models' reconstruction process, conjugating manual, and automatic procedures, while leveraging user's expertise with a curation-oriented graphical interface. An updated and redesigned version of merlin is herein presented. Since 2015, several features were implemented in merlin, along with profound changes in the software architecture, operating flow, and graphical interface. The current version (4.0) includes the implementation of novel algorithms and third-party tools for genome functional annotation, draft assembly, model refinement, and curation. Such updates led to an increase in the user-base, resulting in multiple published works including genome metabolic (re-)annotation and model reconstruction of multiple (lower and higher) eukaryotes and prokaryotes.
Metabolism has been a major field of study in the last years, mainly due to its importance in understanding cell physiology and certain disease phenotypes due to its deregulation. Genome-scale metabolic models (GSMMs) have been established as important tools to help achieve a better understanding of human metabolism. Towards this aim, advances in systems biology and bioinformatics have allowed the reconstruction of several human GSMMs, although some limitations and challenges remain, such as the lack of external identifiers for both metabolites and reactions. A pipeline was developed to integrate multiple GSMMs, starting by retrieving information from the main human GSMMs and evaluating the presence of external database identifiers and annotations for both metabolites and reactions. Information from metabolites was included into a graph database with omics data repositories, allowing clustering of metabolites through their similarity regarding database cross-referencing. Metabolite annotation of several older GSMMs was enriched, allowing the identification and integration of common entities. Using this information, as well as other metrics, we successfully integrated reactions from these models. These methods can be leveraged towards the creation of a unified consensus model of human metabolism.
This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools.
Biomedical literature is composed of an ever increasing number of publications in natural language. Patents are a relevant fraction of those, being important sources of information due to all the curated data from the granting process. However, their unstructured data turns the search of information a challenging task. To surpass that, Biomedical text mining (BioTM) creates methodologies to search and structure that data. Several BioTM techniques can be applied to patents. From those, Information Retrieval is the process where relevant data is obtained from collections of documents. In this work, a patent pipeline was developed and integrated into @Note2, an open-source computational framework for BioTM. This integration allows to run further BioTM tools over the patent documents, including Information Extraction processes as Named Entity Recognition or Relation Extraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.