With the popularity of knowledge graphs growing rapidly, large amounts of RDF graphs have been released, which raises the need for addressing the challenge of distributed subgraph matching queries. In this paper, we propose an efficient distributed method to answer subgraph matching queries on big RDF graphs using MapReduce. In our method, query graphs are decomposed into a set of stars that utilize the semantic and structural information embedded RDF graphs as heuristics. Two optimization techniques are proposed to further improve the efficiency of our algorithms. One algorithm, called RDF property filtering, filters out invalid input data to reduce intermediate results; the other is to improve the query performance by postponing the Cartesian product operations. The extensive experiments on both synthetic and real-world datasets show that our method outperforms the close competitors S2X and SHARD by an order of magnitude on average.
With RDF becoming the de facto standard for representing knowledge graphs, it is indispensable to develop scalable subgraph matching algorithms over big RDF graphs stored in distributed clusters. In this paper, we propose a novel distributed subgraph matching method SP-Tree, using the Pregel model, to answer subgraph matching queries on big RDF graphs. In our method, the query graph is transformed to a variant spanning tree based on the shortest paths. Two optimization techniques are proposed to improve the efficiency of our algorithms. One employs RDF shapes to filter out local computations and messages passed, the other postpones the Cartesian product operations in the matching process to reduce intermediate results. The extensive experiments on both synthetic and real-world datasets show that our SP-Tree subgraph matching method outperforms the state-of-the-art methods by an order of magnitude.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.