An analytical study of large SPARQL query logs

Bonifati, Angela; MartensWim,; TimmThomas,

doi:10.14778/3149193.3149196

Cited by 76 publications

(142 citation statements)

References 29 publications

Supporting

Mentioning

131

Contrasting

Order By: Relevance

“…We restrict RPQs to handle atomic paths -bi-directional, optional, singlelabeled (l e , l e ?, and l − e ) and transitive single-labeled (l * e ) -and composite paths -conjunctive and disjunctive composition of atomic paths (l e · l e and π + π). While not as general as SPARQL, our fragment already captures more than 60% of the property paths found in practice in SPARQL query logs [8]. Moreover, it captures property path queries, as found in the large Wikidata corpus studied in [9].…”

Section: Preliminariesmentioning

confidence: 95%

“…However, arbitrarily complex queries [2,3,7], entailing rather intricate, possibly recursive, graph patterns prove difficult to evaluate, even on small-sized graph datasets [4,5]. On the other hand, the usage of these queries has radically increased in real-world query logs, as shown by recent empirical studies on SPARQL queries from large-scale Wikidata and DBPedia corpuses [8,17]. As a tangible example of this growth, the percentage of SPARQL property paths has increased from 15% to 40%, from 2017 to beginning 2018 [17], for user-specified Wikidata queries.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Approximate Querying on Property Graphs

Dumbrava

Bonifati

Diaz

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Property graphs are becoming widespread when modeling data with complex structural characteristics and enriching edges and nodes with a list of properties. In this paper, we focus on the approximate evaluation of counting queries involving recursive paths on property graphs. As such queries are already difficult to evaluate over pure RDF graphs, they require an ad-hoc graph summary for their approximate evaluation on property graphs. We prove the intractability of the optimal graph summarization problem, under our algorithm's conditions. We design and implement a novel property graph summary suitable for the above queries, along with an approximate query evaluation module. Finally, we show the compactness of the obtained summaries as well as the accuracy of answering counting recursive queries on them.

show abstract

Section: Preliminariesmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

Approximate Querying on Property Graphs

Dumbrava

Bonifati

Diaz

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…There are also no unions (Q5) and no questions with multiple subqueries (Q10). Bonifati et al (2017) investigated a large corpus of query logs from different SPARQL endpoints. The query log files are from seven different data sources from various domains.…”

Section: Question Analysismentioning

confidence: 99%

A comparative survey of recent natural language interfaces for databases

2019

View full text Add to dashboard Cite

Over the last few years natural language interfaces (NLI) for databases have gained significant traction both in academia and industry. These systems use very different approaches as described in recent survey papers. However, these systems have not been systematically compared against a set of benchmark questions in order to rigorously evaluate their functionalities and expressive power.In this paper, we give an overview over 24 recently developed NLIs for databases. Each of the systems is evaluated using a curated list of ten sample questions to show their strengths and weaknesses. We categorize the NLIs into four groups based on the methodology they are using: keyword-, pattern-, parsing-, and grammarbased NLI. Overall, we learned that keyword-based systems are enough to answer simple questions. To solve more complex questions involving subqueries, the system needs to apply some sort of parsing to identify structural dependencies. Grammar-based systems are overall the most powerful ones, but are highly dependent on their manually designed rules. In addition to providing a systematic analysis of the major systems, we derive lessons learned that are vital for designing NLIs that can answer a wide range of user questions.

show abstract

“…To our knowledge, our work is the first that i) analyses real query logs from known endpoints for finding popular patterns of queries that can be answered or cannot be answered through zeroknowledge link traversal, and ii) provides open source methods to detect answerable queries and transform them to SPARQL-LD queries that are evaluated without accessing endpoints or indexes. While recent works have conducted extensive analytical studies on the syntactical and structural characteristics of real SPARQL queries [1,2,25], no previous work has analysed queries in terms of their answerability through link traversal.…”

Section: Link Traversalmentioning

confidence: 99%

How many and what types of SPARQL queries can be answered through zero-knowledge link traversal?

Fafalios

Tzitzikas

2019

Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

View full text Add to dashboard Cite

The current de-facto way to query the Web of Data is through the SPARQL protocol, where a client sends queries to a server through a SPARQL endpoint. Contrary to an HTTP server, providing and maintaining a robust and reliable endpoint requires a significant effort that not all publishers are willing or able to make. An alternative query evaluation method is through link traversal, where a query is answered by dereferencing online web resources (URIs) at real time. While several approaches for such a lookup-based query evaluation method have been proposed, there exists no analysis of the types (patterns) of queries that can be directly answered on the live Web, without accessing local or remote endpoints and without a-priori knowledge of available data sources. In this paper, we first provide a method for checking if a SPARQL query (to be evaluated on a SPARQL endpoint) can be answered through zero-knowledge link traversal (without accessing the endpoint), and analyse a large corpus of real SPARQL query logs for finding the frequency and distribution of answerable and non-answerable query patterns. Subsequently, we provide an algorithm for transforming answerable queries to SPARQL-LD queries that bypass the endpoints. We report experimental results about the efficiency of the transformed queries and discuss the benefits and the limitations of this query evaluation method.

show abstract

An analytical study of large SPARQL query logs

Cited by 76 publications

References 29 publications

Approximate Querying on Property Graphs

Approximate Querying on Property Graphs

A comparative survey of recent natural language interfaces for databases

How many and what types of SPARQL queries can be answered through zero-knowledge link traversal?

Contact Info

Product

Resources

About