Presently, there are numerous bioinformatics databases available on different websites. Although RDF was proposed as a standard format for the web, these databases are still available in various formats. With the increasing popularity of the semantic web technologies and the ever growing number of databases in bioinformatics, there is a pressing need to develop mashup systems to help the process of bioinformatics knowledge integration. Bio2RDF is such a system, built from rdfizer programs written in JSP, the Sesame open source triplestore technology and an OWL ontology. With Bio2RDF, documents from public bioinformatics databases such as Kegg, PDB, MGI, HGNC and several of NCBI's databases can now be made available in RDF format through a unique URL in the form of http://bio2rdf.org/namespace:id. The Bio2RDF project has successfully applied the semantic web technology to publicly available databases by creating a knowledge space of RDF documents linked together with normalized URIs and sharing a common ontology. Bio2RDF is based on a three-step approach to build mashups of bioinformatics data. The present article details this new approach and illustrates the building of a mashup used to explore the implication of four transcription factor genes in Parkinson's disease. The Bio2RDF repository can be queried at http://bio2rdf.org.
The Bio2RDF project uses open-source Semantic Web technologies to provide interlinked life science data in order to maximize productivity and facilitate biological knowledge discovery. Using both syntactic and semantic data integration techniques, Bio2RDF puts into practice a simple methodology to generate and seamlessly integrate machine-interpretable data that can be powerfully interrogated with SPARQL-based queries to answer sophisticated questions.At its core, database records are converted into a set of statements or so-called triples that are captured together as a named graph annotated with provenance. The records and the entities they are about are provided with a Uniform Resource Identifier (URI) of the form http://bio2rdf.org/prefix:identifier, where the prefix indicates a reserved name for the dataset, record or terminological resource. The application of this simple method allows resources from over 40 datasets to integrate seamlessly at the syntactic level irrespective of whether the original data contains non-Bio2RDF URIs.However, when original data providers such as Uniprot provide their own RDF they will rightfully use URIs that resolve to their servers, but what should they do for externally defined entities? If they follow in Bio2RDF’s footsteps then every data provider will use a different URI. However, should original data providers present and implement a URI scheme, then it becomes possible for others to establish stable links to their resources. As such, we will witness the birth of a more stable linked data network, ensuring that data providers need not provide third party data in a redundant manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.