Muhammad Saleem scite author profile

Abstract. We present LSQ: a Linked Dataset describing SPARQL queries extracted from the logs of public SPARQL endpoints. We argue that LSQ has a variety of uses for the SPARQL research community, be it for example to generate custom benchmarks or conduct analyses of SPARQL adoption. We introduce the LSQ data model used to describe SPARQL query executions as RDF. We then provide details on the four SPARQL endpoint logs that we have RDFised thus far. The resulting dataset contains 73 million triples describing 5.7 million query executions.

show abstract

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks

Saleem

Szárnyas

Conrads

et al. 2019

View full text Add to dashboard Cite

Triplestores are data management systems for storing and querying RDF data. Over recent years, various benchmarks have been proposed to assess the performance of triplestores across different performance measures. However, choosing the most suitable benchmark for evaluating triplestores in practical settings is not a trivial task. This is because triplestores experience varying workloads when deployed in real applications. We address the problem of determining an appropriate benchmark for a given real-life workload by providing a fine-grained comparative analysis of existing triplestore benchmarks. In particular, we analyze the data and queries provided with the existing triplestore benchmarks in addition to several real-world datasets. Furthermore, we measure the correlation between the query execution time and various SPARQL query features and rank those features based on their significance levels. Our experiments reveal several interesting insights about the design of such benchmarks. With this fine-grained evaluation, we aim to support the design and implementation of more diverse benchmarks. Application developers can use our result to analyze their data and queries and choose a data management system.

show abstract

FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

Saleem

Mehmood

Ngomo

2015

View full text Add to dashboard Cite

A fine-grained evaluation of SPARQL endpoint federation systems

Saleem

Khan

Hasnain

et al. 2016

View full text Add to dashboard Cite

The Web of Data has grown enormously over the last years. Currently, it comprises a large compendium of interlinked and distributed datasets from multiple domains. The abundance of datasets has motivated considerable work for developing SPARQL query federation systems, the dedicated means to access data distributed over the Web of Data. However, the granularity of previous evaluations of such systems has not allowed deriving of insights concerning their behavior in different steps involved during federated query processing. In this work, we perform extensive experiments to compare state-of-the-art SPARQL endpoint federation systems using the comprehensive performance evaluation framework FedBench. We extend the scope of the performance evaluation by considering additional criteria to the commonly used key criterion (i.e. the query runtime). In particular, we consider the number of sources selected, total number of SPARQL ASK requests used, and source selection time, the criteria which have not received much attention in the previous studies. Yet, we show that they have a significant impact on the overall query runtime of existing systems. Also, we extend FedBench to mirror a highly distributed data environment and assess the behavior of existing systems by using the same four criteria. As the result we provide a detailed analysis of the experimental outcomes that reveal novel insights for improving current and future SPARQL federation systems.

show abstract

DAW: Duplicate-AWare Federated Query Processing over the Web of Data

Saleem

Ngomo

Parreira³

et al. 2013

View full text Add to dashboard Cite

Abstract. Over the last years the Web of Data has developed into a large compendium of interlinked data sets from multiple domains. Due to the decentralised architecture of this compendium, several of these datasets contain duplicated data. Yet, so far, only little attention has been paid to the effect of duplicated data on federated querying. This work presents DAW, a novel duplicate-aware approach to federated querying over the Web of Data. DAW is based on a combination of min-wise independent permutations and compact data summaries. It can be directly combined with existing federated query engines in order to achieve the same query recall values while querying fewer data sources. We extend three well-known federated query processing engines -DARQ, SPLENDID, and FedX -with DAW and compare our extensions with the original approaches. The comparison shows that DAW can greatly reduce the number of queries sent to the endpoints, while keeping high query recall values. Therefore, it can significantly improve the performance of federated query processing engines. Moreover, DAW provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.

show abstract

Distributed Semantic Analytics Using the SANSA Stack

Lehmann

Sejdiu

Bühmann

et al. 2017

View full text Add to dashboard Cite

CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation

Saleem¹,

Potocki²,

Soru³

et al. 2018

Procedia Computer Science

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Muhammad Saleem

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

LSQ: The Linked SPARQL Queries Dataset

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks

FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

A fine-grained evaluation of SPARQL endpoint federation systems

DAW: Duplicate-AWare Federated Query Processing over the Web of Data

Distributed Semantic Analytics Using the SANSA Stack

CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation

Contact Info

Product

Resources

About