Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing 2011
DOI: 10.1145/1996014.1996021
|View full text |Cite
|
Sign up to set email alerts
|

Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Abstract: Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the ClauseIteration appr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
60
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 59 publications
(61 citation statements)
references
References 13 publications
(17 reference statements)
0
60
0
Order By: Relevance
“…The first category generally partitions an RDF dataset across multiple servers using horizontal (random) partitioning, stores partitions using distributed file systems such as Hadoop Distributed File System (HDFS), and processes queries by parallel access to the clustered servers using distributed programming model such as Hadoop MapReduce [20,12]. SHARD [20] directly stores RDF triples in HDFS as flat text files and runs one Hadoop job for each clause (triple pattern) of a SPARQL query.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The first category generally partitions an RDF dataset across multiple servers using horizontal (random) partitioning, stores partitions using distributed file systems such as Hadoop Distributed File System (HDFS), and processes queries by parallel access to the clustered servers using distributed programming model such as Hadoop MapReduce [20,12]. SHARD [20] directly stores RDF triples in HDFS as flat text files and runs one Hadoop job for each clause (triple pattern) of a SPARQL query.…”
Section: Related Workmentioning
confidence: 99%
“…SHARD [20] directly stores RDF triples in HDFS as flat text files and runs one Hadoop job for each clause (triple pattern) of a SPARQL query. [12] stores RDF triples in HDFS by hashing on predicates and runs one Hadoop job for each join of a SPARQL query.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Hadoop based RDF data systems, such as [16], [23], [24] directly store RDF data as HDFS files, and distribute these files by using the file partitioning and placement policies in the vanilla Hadoop. However, previous studies [9], [17] showed that, without carefully designed data partitioning algorithms and data localization strategies, massive I/O cost and communication overhead would be incurred in these kind of systems.…”
Section: Related Workmentioning
confidence: 99%
“…A popular approach to partition RDF data is hash partitioning, which is adopted by a majority of the existing distributed RDF engines [13], [14], [18], [24]. This approach distributes RDF triples across different partitions by computing a hash key over either the subject or the object of each triple.…”
Section: Introduction Rdf (Resource Description Framework)mentioning
confidence: 99%