We study fundamental aspects related to the efficient processing of the SPARQL query language for RDF, proposed by the W3C to encode machine-readable information in the Semantic Web. Our key contributions are (i) a complete complexity analysis for all operator fragments of the SPARQL query language, which -as a central result -shows that the SPARQL operator OPTIONAL alone is responsible for the PSPACE-completeness of the evaluation problem, (ii) a study of equivalences over SPARQL algebra, including both rewriting rules like filter and projection pushing that are wellknown from relational algebra optimization as well as SPARQLspecific rewriting schemes, and (iii) an approach to the semantic optimization of SPARQL queries, built on top of the classical chase algorithm. While studied in the context of a theoretically motivated set semantics, almost all results carry over to the official, bag-based semantics and therefore are of immediate practical relevance.
Abstract-Recently, the SPARQL query language for RDF has reached the W3C recommendation status. In response to this emerging standard, the database community is currently exploring efficient storage techniques for RDF data and evaluation strategies for SPARQL queries. A meaningful analysis and comparison of these approaches necessitates a comprehensive and universal benchmark platform. To this end, we have developed SP 2 Bench, a publicly available, language-specific SPARQL performance benchmark. SP 2 Bench is settled in the DBLP scenario and comprises both a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. As a proof of concept, we apply SP 2 Bench to existing engines and discuss their strengths and weaknesses that follow immediately from the benchmark results.
We study the termination problem of the chase algorithm, a central tool in various database problems such as the constraint implication problem, Conjunctive Query optimization, rewriting queries using views, data exchange, and data integration. The basic idea of the chase is, given a database instance and a set of constraints as input, to fix constraint violations in the database instance. It is wellknown that, for an arbitrary set of constraints, the chase does not necessarily terminate (in general, it is even undecidable if it does or not). Addressing this issue, we review the limitations of existing sufficient termination conditions for the chase and develop new techniques that allow us to establish weaker sufficient conditions. In particular, we introduce two novel termination conditions called safety and inductive restriction, and use them to define the so-called T -hierarchy of termination conditions. We then study the interrelations of our termination conditions with previous conditions and the complexity of checking our conditions. This analysis leads to an algorithm that checks membership in a level of the T -hierarchy and accounts for the complexity of termination conditions. As another contribution, we study the problem of data-dependent chase termination and present sufficient termination conditions w.r.t. fixed instances. They might guarantee termination although the chase does not terminate in the general case. As an application of our techniques beyond those already mentioned, we transfer our results into the field of query answering over knowledge bases where the chase on the underlying database may not terminate, making existing algorithms applicable to broader classes of constraints.
The goal of the Semantic Web is to support semantic interoperability between applications exchanging data on the web. The idea heavily relies on data being made available in machine readable format, using semantic markup languages. In this regard, the W3C has standardized RDF as the basic markup language for the Semantic Web. In contrast to relational databases, where data relationships are implicitly given by schema information as well as primary and foreign key constraints, relationships in semantic markup languages are made explicit. When mapping relational data into RDF, it is desirable to maintain the information implied by the origin constraints. As an improvement over existing approaches, our scheme allows for translating conventional databases into RDF without losing general constraints and vital key information. As much as in the relational model, those information are indispensable for data consistency and, as shown by example, can serve as a basis for semantic query optimization. We underline the practicability of our approach by showing that SPARQL, the most popular query language for RDF, can be used as a constraint language, akin to SQL in the relational context. As a theoretical contribution, we also discuss satisfiability for interesting classes of constraints and combinations thereof.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.