During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.
The goal of this work is to design and develop an Information System that integrates human genome variation data currently scattered in different repositories. The continuous and increasing interest generated around the variations knowledge, makes the study of this research topic from an Information System point of view extremely attractive.The system has been developed following a conceptual-model based methodology. The conceptual model represents, in a formal way, genome variation knowledge. The definition and categorization of variations is unified using this conceptualization. Once this conceptual model is established, it is implemented in a database (Human Genome DataBase, HGDB). The database acts as a unified variation repository of integrated information that will allow biologists to perform efficient recovery tasks. Lastly, a loading module has been implemented, using an extraction-transformation-load (ETL) strategy, in order to integrate data from three relevant variation repositories: HapMap, Ensembl and Cosmic. An exploitation module for final users is also provided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.