Database users may be frustrated by no answers returned when they pose a query on the database. In this paper, we study the problem of relaxing queries on RDF databases in order to acquire approximate answers. We address two problems in efficient query relaxation. First, to ensure the quality of answers, we compute the similarities between relaxed queries with regard to the user query and use them to score the potential relevant answers. Second, for obtaining top-k answers, we develop two algorithms. One is based on the best-first strategy and relaxed queries are executed in the ranking order. The batch based algorithm executes the relaxed queries as a batch and avoids unnecessary execution cost. At last, we implement and experimentally evaluate our approaches.
A fundamental problem related to RDF query processing is selectivity estimation, which is crucial to query optimization for determining a join order of RDF triple patterns. In this paper we focus research on selectivity estimation for SPARQL graph patterns. The previous work takes the join uniformity assumption when estimating the joined triple patterns. This assumption would lead to highly inaccurate estimations in the cases where properties in SPARQL graph patterns are correlated. We take into account the dependencies among properties in SPARQL graph patterns and propose a more accurate estimation model. Since star and chain query patterns are common in SPARQL graph patterns, we first focus on these two basic patterns and propose to use Bayesian network and chain histogram respectively for estimating the selectivity of them. Then, for estimating the selectivity of an arbitrary SPARQL graph pattern, we design algorithms for maximally using the precomputed statistics of the star paths and chain paths. The experiments show that our method outperforms existing approaches in accuracy.
Query relaxation is an important problem for querying RDF data flexibly. The previous work mainly uses ontology information for relaxing user queries. The ranking models proposed, however, are either non-quantifiable or imprecise. Furthermore, the recommended relaxed queries may return no results. In this paper, we aim to solve these problems by proposing a new ranking model. The model ranks the relaxed queries according to their similarities to the original user query. The similarity of a relaxed query to the original query is measured based on the difference of their estimated results. To compute similarity values for star queries efficiently and precisely, Bayesian networks are employed to estimate the result numbers of relaxed queries. An algorithm is also proposed for answering top-k queries. At last experiments validate the effectiveness of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.