Map-reduce as a Programming Model for Custom Computing Machines

Yeung, Jackson H.C.; Tsang, Chi Chiu; Tsoi, Kuen Hung; Kwan, B.S.H.; Cheung, Chung Ching; Chan, Abraham; Leong, Philip H. W.

doi:10.1109/fccm.2008.19

Cited by 70 publications

(36 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Memory bandwidth and the number of hard multipliers embedded in the FPGA are the constraints on mapping the inner level MapReduce pattern. Figure 4 shows results from applying our proposed piecewise GP model (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13) to the innermost loop of MAT64. Given a memory bandwidth and the number of multipliers, the figure reveals the performance-optimal design: for example, when the memory bandwidth is 3 bytes/execution cycle and 3 multipliers are available, the performance-optimal design is (k = 3, ii = 2); when the memory bandwidth is 5 bytes/execution cycles and 5 multipliers available, the performance-optimal design is (k = 5, ii = 2), as shown in Fig.…”

Section: Resultsmentioning

confidence: 99%

“…The Haskell functional language and Google's Sawzall are used to describe the Map and Reduce functions. Yeung et al [3] apply the MapReduce programming model to design high performance systems on FPGAs and GPUs. All these methods require designers to identify the MapReduce pattern and specify the Map and Reduce functions explicitly.…”

Section: Related Workmentioning

confidence: 99%

“…MapReduce is a technique widely used to improve parallelism of large-scale computations [1][2][3]. It partitions the computation into two phases: first, the Map phase, in which the same computation is performed independently on multiple data elements; second, the Reduce phase, in which the final result is calculated by reducing the results of the Map phase with an associative operator.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Liu

Todman

Luk

et al. 2010

J Sign Process Syst

View full text Add to dashboard Cite

The MapReduce pattern can be found in many important applications, and can be exploited to significantly improve system parallelism. Unlike previous work, in which designers explicitly specify how to exploit the pattern, we develop a compilation approach for mapping applications with the MapReduce pattern automatically onto Field-Programmable Gate Array (FPGA) based parallel computing platforms. We formulate the problem of mapping the MapReduce pattern to hardware as a geometric programming model; this model exploits loop-level parallelism and pipelining to give an optimal implementation on given hardware resources. The approach is capable of handling single and multiple nested MapReduce patterns. Furthermore, we explore important variations of MapReduce, such as using a linear structure rather than a tree structure for merging intermediate results generated in parallel. Results for six benchmarks show that our approach can find performance-optimal designs in the design space, improving system performance by up to 170 times compared to the initial designs on the target platform.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Liu

Todman

Luk

et al. 2010

J Sign Process Syst

View full text Add to dashboard Cite

show abstract

“…Several related research efforts focus on porting MapReduce to prominent hardware platforms for high-performance computing, including multicore processors [8], [10], [11], GPUs [6], [19] the Cell processor [7], [9] and FPGAs via direct software to hardware translation [20]. Throughout this paper, we compare our runtime system design and implementation against the design and implementation of MapReduce for the Cell proposed by de Krujif and Sankaralingam [7].…”

Section: Related Workmentioning

confidence: 99%

Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories

Papagiannis

Nikolopoulos

2010

2010 39th International Conference on Parallel Processing

View full text Add to dashboard Cite

Abstract-This paper presents a new design and an implementation of the runtime system of MapReduce for heterogeneous multicore processors with explicitly managed local memories. We advance the state of the art in runtime support for MapReduce using five instruments: (1) A new multi-threaded, event-driven controller for task instantiation, task scheduling, synchronization, and bulk-synchronous execution of MapReduce stages. The controller improves utilization of controlefficient cores, minimizes control overhead in the runtime system, and overlaps task instantiation with task scheduling on compute-efficient cores. (2) An implicit partitioning scheme which eliminates redundant memory copies. (3) An adaptive memory management scheme which combines efficient memory preallocation for applications with statically known output volume with dynamic allocation using runahead tasks for applications with statically unknown output volume. (4) An optimized quick-sort/merge-sort scheme which reduces the critical path length of merge-sort. (5) An optimized execution scheme which avoids redundant data transfers to and from local stores in applications that emit keys with the same value. Put together, these techniques accelerate representative MapReduce workloads by a factor of 1.81× (geometric mean) compared to a reference design that represents the state of the art.

show abstract

“…Fortunately, this situation is starting to change. Frameworks such as Map-Reduce [14] successfully exploit implicit parallelism on distributed systems and have also been extended to heterogeneous platforms such as GPU [17] and FPGA [26], but unfortunately have a restricted programming model. Other models, such as CUDA [21] and OpenCL [19], provide a restricted programming model to the users of GPU accelerators, but also expose a significant amount of hardware details.…”

Section: Introductionmentioning

confidence: 99%

Mapping a data-flow programming model onto heterogeneous platforms

Sbirlea

Zou

Budimlić

et al. 2012

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Syst

View full text Add to dashboard Cite

In this paper we explore mapping of a high-level macro data-flow programming model called Concurrent Collections (CnC) onto heterogeneous platforms in order to achieve high performance and low energy consumption while preserving the ease of use of data-flow programming. Modern computing platforms are becoming increasingly heterogeneous in order to improve energy efficiency. This trend is clearly seen across a diverse spectrum of platforms, from small-scale embedded SOCs to large-scale super-computers. However, programming these heterogeneous platforms poses a serious challenge for application developers. We have designed a software flow for converting high-level CnC programs to the Habanero-C language. CnC programs have a clear separation between the application description, the implementation of each of the application components and the abstraction of hardware platform, making it an excellent programming model for domain experts. Domain experts can later employ the help of a tuning expert (either a compiler or a person) to tune their applications with minimal effort. We also extend the Habanero-C runtime system to support work-stealing across heterogeneous computing devices and introduce task affinity for these heterogeneous components to allow users to fine tune the runtime scheduling decisions. We demonstrate a working example that maps a pipeline of medical image-processing algorithms onto a prototype heterogeneous platform that includes CPUs, GPUs and FPGAs. For the medical imaging domain, where obtaining fast and accurate results is a critical step in diagnosis and treatment of patients, we show that our model offers up to 17.72× speedup and an estimated usage of 0.52× of the power used by CPUs alone, when using accelerators (GPUs and FPGAs) and CPUs.

show abstract

Map-reduce as a Programming Model for Custom Computing Machines

Cited by 70 publications

References 12 publications

Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories

Mapping a data-flow programming model onto heterogeneous platforms

Contact Info

Product

Resources

About