Scaling queries over big RDF graphs with semantic hash partitioning

Lee, Kisung; Liu, Ling

doi:10.14778/2556549.2556571

Cited by 110 publications

(119 citation statements)

References 15 publications

(21 reference statements)

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…Most existing works are based on scanning the data a-priori and either saving new pieces of information about it, or providing alternative data representations. The works in [7,9,13,14,16] are based on techniques that mainly focus on join optimizations by indexing the data. These works do not consider structured data and data typing.…”

Section: Related Workmentioning

confidence: 99%

Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

Abbas

Genevès

Roisin

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

ShEx (Shape Expressions) is a language for expressing constraints on RDF graphs. In this work we optimize the evaluation of conjunctive SPARQL queries, on RDF graphs, by taking advantage of ShEx constraints. Our optimization is based on computing and assigning ranks to query triple patterns, dictating their order of execution. We first define a set of well formed ShEx schemas, that possess interesting characteristics for SPARQL query optimization. We then define our optimization method by exploiting information extracted from a ShEx schema. We finally report on evaluation results performed showing the advantages of applying our optimization on the top of an existing state-of-the-art query evaluation system.

show abstract

Section: Related Workmentioning

confidence: 99%

Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

Abbas

Genevès

Roisin

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Lee and Liu present a novel semantic hash approach that utilizes access locality to partition big graphs across multiple computing nodes by maximizing the intrapartition processing capability and minimizing the interpartition communication cost [22]. Huang et al use a graph partitioning algorithm instead of simple hash partitioning by source vertex, destination vertex, and labeled or unlabeled edge [23].…”

Section: Related Workmentioning

confidence: 99%

An Association-Oriented Partitioning Approach for Streaming Graph Query

Hao

Yuan

et al. 2017

Scientific Programming

View full text Add to dashboard Cite

The volumes of real-world graphs like knowledge graph are increasing rapidly, which makes streaming graph processing a hot research area. Processing graphs in streaming setting poses significant challenges from different perspectives, among which graph partitioning method plays a key role. Regarding graph query, a well-designed partitioning method is essential for achieving better performance. Existing offline graph partitioning methods often require full knowledge of the graph, which is not possible during streaming graph processing. In order to handle this problem, we propose an association-oriented streaming graph partitioning method named Assc. This approach first computes the rank values of vertices with a hybrid approximate PageRank algorithm. After splitting these vertices with an adapted variant affinity propagation algorithm, the process order on vertices in the sliding window can be determined. Finally, according to the level of these vertices and their association, the partition where the vertices should be distributed is decided. We compare its performance with a set of streaming graph partition methods and METIS, a widely adopted offline approach. The results show that our solution can partition graphs with hundreds of millions of vertices in streaming setting on a large collection of graph datasets and our approach outperforms other graph partitioning methods.

show abstract

“…And one more interesting thing is that the key range generated by YCSB is uniform, when turn to highly skewed key range distribution, a more carefully PreSplit design is critical to achieve load balance. In HConfig system, we allow external data partitioning algorithms such as [32] to be plugged into the PreSplit policy.…”

Section: Multi-databases With Variable Blocksmentioning

confidence: 99%

A Configuration Management Study to Fast Massive Writing for Distributed NoSQL System

Bao

Xiao

et al. 2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYNoSQL systems have become vital components to deliver big data services due to their high horizontal scalability. However, existing NoSQL systems rely on experienced administrators to configure and tune the wide range of configurable parameters for optimized performance. In this work, we present a configuration management framework for NoSQL systems, called xConfig. With xConfig, its users can first identify performance sensitive parameters and capture the tuned parameters for different workloads as configuration policies. Next, based on tuned policies, xConfig can be implemented as the corresponding configuration optimiaztion system for the specific NoSQL system. Also it can be used to analyze the range of configurable parameters that may impact the runtime performance of NoSQL systems. We implement a prototype called HConfig based on HBase, and the parameter tuning strategies for HConfig can generate tuned policies and enable HBase to run much more efficiently on both individual worker node and entire cluster. The massive writing oriented evaluation results show that HBase under write-intensive policies outperforms both the default configuration and some existing configurations while offering significantly higher throughput.

show abstract

Scaling queries over big RDF graphs with semantic hash partitioning

Cited by 110 publications

References 15 publications

Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

An Association-Oriented Partitioning Approach for Streaming Graph Query

A Configuration Management Study to Fast Massive Writing for Distributed NoSQL System

Contact Info

Product

Resources

About