Mulugeta Mammo scite author profile

Mulugeta Mammo

3Publications

2Citation Statements Received

30Citation Statements Given

How they've been cited

How they cite others

Affiliations

Arizona State University

Publications

Order By: Most citations

Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce

Mammo

Bansal

2015

View full text Add to dashboard Cite

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.ii DEDICATION This thesis is dedicated my parents for their love and support and to my brother who initiated the idea of coming to the US for my master's study and who sponsored my education.iii ACKNOWLEDGMENTS

show abstract

Distributed SPARQL Querying Over Big RDF Data Using Presto-RDF

Mammo¹,

Hassan²,

Bansal³

2015

STBD

View full text Add to dashboard Cite

The processing of large volumes of RDF data requires an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational database management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example. This paper proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data. We evaluate the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done with RDF datasets of size 10, 20, and 30 million triples. The results of the experiments show that Presto-RDF has a much higher performance than Hive and can be used to process big RDF data.

show abstract

Presto-RDF: SPARQL Querying over Big RDF Data

Mammo

Bansal

2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mulugeta Mammo

Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce

Distributed SPARQL Querying Over Big RDF Data Using Presto-RDF

Presto-RDF: SPARQL Querying over Big RDF Data

Contact Info

Product

Resources

About