StreamMR: An Optimized MapReduce Framework for AMD GPUs

Elteir, Marwa K.; Lin, Heshan; Feng, Wu-chun; Scogland, Thomas R. W.

doi:10.1109/icpads.2011.131

Cited by 30 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data processing using MapReduce has many advantages such as horizontal scalability, fault tolerance, high performance, high throughput and commodity hardware. Although it was primarily designed for index construction in search engines , it can be used in data analysis as well – quite a few algorithms can be expressed in MapReduce .…”

Section: Related Work and Backgroundmentioning

confidence: 99%

Single-scan: a fast star-join query processing algorithm

PURDILĂ

Pentiuc

2015

Softw. Pract. Exper.

View full text Add to dashboard Cite

Summary A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley & Sons, Ltd.

show abstract

Section: Related Work and Backgroundmentioning

confidence: 99%

Single-scan: a fast star-join query processing algorithm

PURDILĂ

Pentiuc

2015

Softw. Pract. Exper.

View full text Add to dashboard Cite

show abstract

“…Recently many studies to reduce the execution time of MapReduce operation using a graphics processing unit (GPU) have been actively conducted [2,5,6,7,8]. Single-Instruction, Multiple-Data (SIMD) processors on a GPU can quickly evaluate applications (e.g., String Match, Word Count, Kmeans, etc.)…”

Section: Introductionmentioning

confidence: 99%

“…This data has to be stored as the intermediate data on a local storage. Although Mars [2], StreamMR [5], MapCG [6] implemented a MapReduce framework with atomic-free operations, a general model for analyzing big data was not considered. Using the computation characterization of map and reduce operation, the separating scheduling scheme of MapReduce tasks on a CPU and a GPU was also addressed on the limited size of input data.…”

Section: Introductionmentioning

confidence: 99%

DPM: Data Partitioning Method for pipelined MapReduce on GPU

2014

The 18th IEEE International Symposium on Consumer Electronics (ISCE 2014)

View full text Add to dashboard Cite

The MapReduce frameworks using a modern graphic processor (GPU) have improved the performance of data-intensive applications. While the prior researches have enhanced the parallelism of the MapReduce application on a GPU, archiving optimal distribution of big data on heterogeneous devices is still a challengeable issue. We therefore propose a method to evenly separate the computing cost under limited memory size. To solve this problem, we design and propose DPM, a Data Partitioning Method, using a GPU to smartly distribute workload of MapReduce. The proposed technique provides well-balanced processing cost for heterogeneous devices.

show abstract

“…HadoopCL [15] was a seamless combination of OpenCL and Hadoop and provides an easy-to-learn and flexible API in a particular high-performance computing system. Last but not least, beyond the NVIDIA GPU-based systems, there were also optimized MapReduce frameworks for AMD GPUs (StreamMR [12]) and Intel Xeon Phi coprocessors (MrPhi [18]). However, no existing system provides a language-level easy-to-use programming model for GPU clusters as in Vispark.…”

Section: Related Workmentioning

confidence: 99%

Vispark: GPU-accelerated distributed visual computing using spark

Choi

Jeong

2015

2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV)

View full text Add to dashboard Cite

Abstract. With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework suffers from inflexibility and inefficiency inherent to its programming model and system architecture. In order to address these problems, we propose Vispark, a novel extension of Spark for GPU-accelerated MapReduce processing on array-based scientific computing and image processing tasks. Vispark provides an easy-to-use, Python-like high-level language syntax and a novel data abstraction for MapReduce programming on a GPU cluster system. Vispark introduces a programming abstraction for accessing neighbor data in the mapper function, which greatly simplifies many image processing tasks using MapReduce by reducing memory footprints and bypassing the reduce stage. Vispark provides socket-based halo communication that synchronizes between data partitions transparently from the users, which is necessary for many scientific computing problems in distributed systems. Vispark also provides domain-specific functions and language supports specifically designed for highperformance computing and image processing applications. We demonstrate the performance of our prototype system on several visual computing tasks, such as image processing, volume rendering, K-means clustering, and heat transfer simulation.

show abstract

StreamMR: An Optimized MapReduce Framework for AMD GPUs

Cited by 30 publications

References 11 publications

Single-scan: a fast star-join query processing algorithm

Single-scan: a fast star-join query processing algorithm

DPM: Data Partitioning Method for pipelined MapReduce on GPU

Vispark: GPU-accelerated distributed visual computing using spark

Contact Info

Product

Resources

About