Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the ClauseIteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The ClauseIteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.