Reachable subwebs for traversal-based query execution

Hartig, Olaf; Özsu, M. Tamer

doi:10.1145/2567948.2576947

Cited by 6 publications

(9 citation statements)

References 13 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reason for the negative performance of this approach-as well as any other possible purely graph-based approach-is that the applied vertex scoring method rates document and URI vertices only based on graph-specific properties, whereas the result-relevance of reachable documents is independent of such properties. In fact, in our earlier work we show empirically that there does not exist a correlation between the result-relevance-or irrelevance-of reachable documents and the indegree of the corresponding document vertices in the Web graph model (similarly, for the PageRank, the HITS scores, the k-step Markov score, and the betweenness centrality) [9].…”

Section: Evaluation Of the Purely Graph-based Approachesmentioning

confidence: 89%

See 1 more Smart Citation

Walking Without a Map: Ranking-Based Traversal for Querying Linked Data

Hartig

Özsu

2016

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

The traversal-based approach to execute queries over Linked Data on the WWW fetches data by traversing data links and, thus, is able to make use of up-to-date data from initially unknown data sources. While the downside of this approach is the delay before the query engine completes a query execution, user perceived response time may be improved significantly by returning as many elements of the result set as soon as possible. To this end, the query engine requires a traversal strategy that enables the engine to fetch result-relevant data as early as possible. The challenge for such a strategy is that the query engine does not know a priori which of the data sources discovered during the query execution will contain result-relevant data. In this paper, we investigate 14 different approaches to rank traversal steps and achieve a variety of traversal strategies. We experimentally study their impact on response times and compare them to a baseline that resembles a breadth-first traversal. While our experiments show that some of the approaches can achieve noteworthy improvements over the baseline in a significant number of cases, we also observe that for every approach, there is a non-negligible chance to achieve response times that are worse than the baseline.

show abstract

Section: Evaluation Of the Purely Graph-based Approachesmentioning

confidence: 89%

“…In an earlier, more detailed analysis of these queries we make the same observation for the other 13 test Webs [9]. Moreover, if we consider each query in separation and compare its reachable …”

Section: Test Queriesmentioning

confidence: 95%

Walking Without a Map: Ranking-Based Traversal for Querying Linked Data

Hartig

Özsu

2016

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…SQUIN 22 is an iterator-based implementation of the LTQBE paradigm, while LiDaQ [Umbrich et al 2014] also considers lightweight reasoning extensions to find additional sources on-the-fly and generate further answers. Recently, an analysis of subwebs that are reachable by LTQBE engines was performed [Hartig and Ozsu 2014] giving some hints about why results obtained with these engines are not complete. We discuss the differences between LTQBE engines and swget both in terms of scope and expressiveness.…”

Section: Related Workmentioning

confidence: 99%

N auti LOD

2015

View full text Add to dashboard Cite

The Web of Linked Data is a huge graph of distributed and interlinked datasources fueled by structured information. This new environment calls for formal languages and tools to automatize navigation across datasources (nodes in such graph) and enable semantic-aware and Web-scale search mechanisms. In this article we introduce a declarative navigational language for the Web of Linked Data graph called NAUTILOD. NAUTILOD enables one to specify datasources via the intertwining of navigation and querying capabilities. It also features a mechanism to specify actions (e.g., send notification messages) that obtain their parameters from datasources reached during the navigation. We provide a formalization of the NAUTILOD semantics, which captures both nodes and fragments of the Web of Linked Data. We present algorithms to implement such semantics and study their computational complexity. We discuss an implementation of the features of NAUTILOD in a tool called swget, which exploits current Web technologies and protocols. We report on the evaluation of swget and its comparison with related work. Finally, we show the usefulness of capturing Web fragments by providing examples in different knowledge domains.

show abstract

“…It is now well-known that the cost of data retrieval dominates the cost of link-traversal query execution [25]. That is why the query engine's ability to entail that it does not need to exhaustively explore the entire reachable subweb is essential for efficient query execution.…”

Section: Examplesmentioning

confidence: 99%

“…At each iteration, we will experimentally validate the performance of the optimized engine against the base-line version of the same engine. For this purpose, we have extended the experimental setup used in[25].4 https://jena.apache.org 5 http://squin.org…”

mentioning

confidence: 99%

A Hybrid Framework for Online Execution of Linked Data Queries

Sabri

2015

Proceedings of the 24th International Conference on World Wide Web

View full text Add to dashboard Cite

Linked Data has been widely adopted over the last few years, with the size of the Linked Data cloud almost doubling every year. However, there is still no well-defined, efficient mechanism for querying such a Web of Data. We propose a framework that incorporates a set of optimizations to tackle various limitations in the state-of-the-art. The framework aims at combining the centralized query optimization capabilities of the data warehouse-based approaches with the result freshness and explorative data source discovery capabilities of link-traversal approaches. This is achieved by augmenting base-line link-traversal query execution with a set of optimization techniques. The proposed optimizations fall under two categories: metadata-based optimizations and semantics-based optimizations.

show abstract

Reachable subwebs for traversal-based query execution

Cited by 6 publications

References 13 publications

Walking Without a Map: Ranking-Based Traversal for Querying Linked Data

Walking Without a Map: Ranking-Based Traversal for Querying Linked Data

N auti LOD

A Hybrid Framework for Online Execution of Linked Data Queries

Contact Info

Product

Resources

About