Orri Erling scite author profile

The Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, and benchmarking practices for evaluating graph data management systems. The LDBC introduced a new choke-point driven methodology for developing benchmark workloads, which combines user input with input from expert systems architects, which we outline. This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies. SNB has three query workloads under development: Interactive, Business Intelligence, and Graph Algorithms. We describe the SNB Interactive Workload in detail and illustrate the workload with some early results, as well as the goals for the two other workloads.

show abstract

Virtuoso: RDF Support in a Native RDBMS

Erling

Mikhailov

2009

107

111

View full text Add to dashboard Cite

TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

Boncz

Neumann

Erling

2014

View full text Add to dashboard Cite

Abstract. The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC-H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which we call "choke points". We identify 28 different such choke points, grouped into six categories: Aggregation Performance, Join Performance, Data Access Locality, Expression Calculation, Correlated Subqueries and Parallel Execution. On the meta-level, we make the point that the rich set of choke-points found in TPC-H sets an example on how to design future DBMS benchmarks.

show abstract

Managing the Life-Cycle of Linked Data with the LOD2 Stack

Auer

Bühmann

Dirschl

et al. 2012

View full text Add to dashboard Cite

The LOD2 Stack is an integrated distribution of aligned tools which support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to maintenance. The LOD2 Stack comprises new and substantially extended existing tools from the LOD2 project partners and third parties. The stack is designed to be versatile; for all functionality we define clear interfaces, which enable the plugging in of alternative third-party implementations. The architecture of the LOD2 Stack is based on three pillars: (1) Software integration and deployment using the Debian packaging system. (2) Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between the different tools of the LOD2 Stack. (3) Integration of the LOD2 Stack user interfaces based on REST enabled Web Applications. These three pillars comprise the methodological and technological framework for integrating the very heterogeneous LOD2 Stack components into a consistent framework. In this article we describe these pillars in more detail and give an overview of the individual LOD2 Stack components. The article also includes a description of a real-world usage scenario in the publishing domain.

show abstract

Deriving an Emergent Relational Schema from RDF Data

Pham

Passing

Erling

et al. 2015

View full text Add to dashboard Cite

We motivate and describe techniques that allow to detect an "emergent" relational schema from RDF data. We show that on a wide variety of datasets, the found structure explains well over 90% of the RDF triples. Further, we also describe technical solutions to the semantic challenge to give short names that humans find logical to these emergent tables, columns and relationships between tables. Our techniques can be exploited in many ways, e.g., to improve the efficiency of SPARQL systems, or to use existing SQL-based applications on top of any RDF dataset using a RDBMS.

show abstract

An early look at the LDBC social network benchmark's business intelligence workload

Szárnyas

Prat-Pérez²,

Averbuch

et al. 2018

View full text Add to dashboard Cite

In this short paper, we provide an early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload. Its queries involve complex aggregations and navigations (joins) that touch large data volumes, which is typical in BI workloads, yet they depend heavily on graph functionality such as connectivity tests and path finding. We outline the motivation for this new benchmark, which we derived from many interactions with the graph database industry and its users, and situate it in a scenario of social network analysis. The workload was designed by taking into account technical "chokepoints" identified by database system architects from academia and industry, which we also describe and map to the queries. We present reference implementations in openCypher, PGQL, SPARQL, and SQL, and preliminary results of SNB BI on a number of graph data management systems.

show abstract

RDFSync: Efficient Remote Synchronization of RDF Models

Tummarello

Morbidoni

Bachmann-Gmür³

et al. 2007

View full text Add to dashboard Cite

Abstract. In this paper we describe RDFSync, a methodology for efficient synchronization and merging of RDF models. RDFSync is based on decomposing a model into Minimum Self-Contained graphs (MSGs). After illustrating theory and deriving properties of MSGs, we show how a RDF model can be represented by a list of hashes of such information fragments. The synchronization procedure here described is based on the evaluation and remote comparison of these ordered lists. Experimental results show that the algorithm provides very significant savings on network traffic compared to the fileoriented synchronization of serialized RDF graphs. Finally, we provide the design and report the implementation of a protocol for executing the RDFSync algorithm over HTTP.Remote synchronization of data files is a procedure by which local information (e.g. A data file) is updated over a network in order to be made identical with a remote one (or vice versa). Synchronizing could be trivially achieved by copying the entire remote file locally and then comparing it with the local one, but this is largely undesirable due to the performance issues in comparing the entire data file and most of all due to the bandwidth cost of frequent full data transfers.In 1998, the rsync algorithm was developed [1] to efficiently synchronize remote binary files. rsync operates under the assumption that the changes will be significantly lower in size compared to the data file itself and that these are likely to happen in "clusters", that is, in localized spots rather than distributed across the file. When this is the case, rsync can achieve synchronization by transferring data in quantity just slightly higher than the size of the changes. As such, rsync and others comparable

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Orri Erling

RDF Support in the Virtuoso DBMS

The LDBC Social Network Benchmark

Virtuoso: RDF Support in a Native RDBMS

TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

Managing the Life-Cycle of Linked Data with the LOD2 Stack

Deriving an Emergent Relational Schema from RDF Data

An early look at the LDBC social network benchmark's business intelligence workload

RDFSync: Efficient Remote Synchronization of RDF Models

Contact Info

Product

Resources

About