R2RML is used to specify transformations of data avail able in relational databases into materialised or virtual RDF datasets. SPARQL queries evaluated against virtual datasets are translated into SQL queries according to the R2RML mappings, so that they can be evaluated over the underly ing relational database engines. In this paper we describe an extension of a well-known algorithm for SPARQL to SQL translation, originally formalised for RDBMS-backed triple stores, that takes into account R2RML mappings. We present the result of our implementation using queries from a synthetic benchmark and from three real use cases, and show that SPARQL queries can be in general evaluated as fast as the SQL queries that would have been generated by SQL experts if no R2RML mappings had been used.
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with the GTFS-Madrid benchmark; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of Morph-CSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.
Ontology Based Data Access (OBDA) refers to a range of techniques, algorithms and systems that can be used to deal with the heterogeneity of data that is common inside many organisations as well as in inter-organisational settings and more openly on the Web. In OBDA, ontologies are used to provide a global view over multiple local datasets; and mappings are commonly used to describe the relationships between such global and local schemas. Since its inception, this area has evolved in several directions. Initially, the focus was on the translation of original sources into a global schema, and its materialisation, including non-OBDA approaches such as the use of Extract Transform Load (ETL) workflows in data warehouses and, more recently, in data lakes. Then OBDA-based query translation techniques, relying on mappings, were proposed, with the aim of removing the need for materialisation, something especially useful for very dynamic data sources. We think that we are now witnessing the emergence of a new generation of OBDA approaches. It is driven by the fact that a new set of declarative mapping languages, most of which stem from the W3C Recommendation R2RML for Relational Databases (RDB), are being created. In this vision paper, we enumerate the reasons why new mapping languages are being introduced. We discuss why it may be relevant to work on translations among them, so as to benefit from the engines associated to each of them whenever one language and/or engine is more suitable than another. We discuss the emerging concept of "mapping translation", the basis for this new generation of OBDA, together with some of its desirable properties: information preservation and query result preservation. We show several scenarios where mapping translation can be or is being already applied, even though this term has not necessarily been used in existing literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.