Transparent runtime parallelization of the R scripting language

Li, Jiangtian; Ma, Xiaosong; Yoginath, Srikanth B.; Kora, Guruprasad; Samatova, Nagiza F.

doi:10.1016/j.jpdc.2010.08.013

Cited by 11 publications

(7 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That could explain why SCBI MapReduce skeleton shows a speed-up of 31-fold for 32 cores and 59-fold for 64 cores, even with sequence data (Figure 2(a)). This performance is better than the one displayed by the R package pR, where 32 cores provide speedups of 20-27-fold, depending on the process [25]. Several design reasons can also be invoked to explain such an efficiency [34]: (i) disk I/O operations are reduced to minimum (data are read only at the beginning and results are saved only at the end); (ii) absence of asymmetry impact (Figure 1(b)); (iii) the manager overhead is limited when using more than 2 cores and chunks of sequences (Tables 2 and 3); and (iv) longer tasks increased the efficiency because the manager is on standby most of the time, while waiting for the workers to finish, avoiding relaunching of internal or external programs for brief executions.…”

Section: Scbi Mapreduce Is An Efficient Task-farm Skeletonmentioning

confidence: 79%

“…Parallelisation libraries for R language, besides Rmpi, are SPRINT [24] and pR [25] packages, whose their main advantage is that they require very little modification to the existing sequential R scripts and no expertise in parallel computing; however, the master worker suffers from communication overhead, and the authors recognise that their approach may not yield the optimal schedule [25]. Other parallelisation libraries are snow and nws that provide coordination and parallel execution facilities.…”

Section: Related Workmentioning

confidence: 99%

“…This paper describes SCBI MapReduce, a new task-farm skeleton for the Ruby scripting language [32] that gathers the requirements presented in the Introduction, and simplifies the creation of parallel and distributed software to researchers without skills in distributed programming. Even if customisation could appear more complicated than using existing libraries for parallelisation-such as OpenMP [18], BOINC [23], or R libraries such as Rmpi or SPRINT [24]-it is as simple as the parallelisation of R code using pR [25]. In contrast to these libraries, SCBI MapReduce is not constrained only to parallelisation (not allowing distribution nor grid computing) and is able to extract the complete distribution capabilities of any computer system.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Guerrero-Fernández

Falgueras

Claros

2013

Computational Biology Journal

View full text Add to dashboard Cite

Current genomic analyses often require the managing and comparison of big data using desktop bioinformatic software that was not developed regarding multicore distribution. The task-farm SCBI MapReduce is intended to simplify the trivial parallelisation and distribution of new and legacy software and scripts for biologists who are interested in using computers but are not skilled programmers. In the case of legacy applications, there is no need of modification or rewriting the source code. It can be used from multicore workstations to heterogeneous grids. Tests have demonstrated that speed-up scales almost linearly and that distribution in small chunks increases it. It is also shown that SCBI MapReduce takes advantage of shared storage when necessary, is faulttolerant, allows for resuming aborted jobs, does not need special hardware or virtual machine support, and provides the same results than a parallelised, legacy software. The same is true for interrupted and relaunched jobs. As proof-of-concept, distribution of a compiled version of Blast+ in the SCBI Distributed Blast gem is given, indicating that other blast binaries can be used while maintaining the same SCBI Distributed Blast code. Therefore, SCBI MapReduce suits most parallelisation and distribution needs in, for example, gene and genome studies.

show abstract

Section: Scbi Mapreduce Is An Efficient Task-farm Skeletonmentioning

confidence: 79%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Guerrero-Fernández

Falgueras

Claros

2013

Computational Biology Journal

View full text Add to dashboard Cite

show abstract

“…Neither approach involves compiler manipulation, thus differing from our approach. The recent work of Li et al [16] on the scripting array language R is also on parallelizing its run-time routines but it did use sophisticated compiler technology to do this.…”

Section: Discussionmentioning

confidence: 99%

Automatic Parallelization of Array-oriented Programs for a Multi-core Machine

Ching

Zheng

2012

Int J Parallel Prog

View full text Add to dashboard Cite

We present the work on automatic parallelization of array-oriented programs for multi-core machines. Source programs written in standard APL are translated by a parallelizing APL-to-C compiler into parallelized C code, i.e. C mixed with OpenMP directives. We describe techniques such as virtual operations and datapartitioning used to effectively exploit parallelism structured around array-primitives. We present runtime performance data, showing the speedup of the resulting parallelized code, using different numbers of threads and different problem sizes, on a 4-core machine, for several examples.

show abstract

“…Targeting graph mining, PEGASUS [15] implements generalized iterated matrix-vector multiply efficiently on Hadoop. RIOT [24] and RevoScaleR focus on making statistical computing workloads in R I/O-efficient; pR [16] automatically parallelizes function calls and loops in R. Pig [17], Hive [20], and SciHadoop [4] are examples of higher-level languages and execution plan generators for MapReduce systems. Our work goes beyond these systems by addressing important usability issues of automatic hardware provisioning and configuration.…”

Section: Related Workmentioning

confidence: 99%

Cumulon

Huang

Babu

Yang

2013

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

We present Cumulon, a system designed to help users rapidly develop and intelligently deploy matrix-based big-data analysis programs in the cloud. Cumulon features a flexible execution model and new operators especially suited for such workloads. We show how to implement Cumulon on top of Hadoop/HDFS while avoiding limitations of MapReduce, and demonstrate Cumulon's performance advantages over existing Hadoop-based systems for statistical data analysis. To support intelligent deployment in the cloud according to time/budget constraints, Cumulon goes beyond databasestyle optimization to make choices automatically on not only physical operators and their parameters, but also hardware provisioning and configuration settings. We apply a suite of benchmarking, simulation, modeling, and search techniques to support effective cost-based optimization over this rich space of deployment plans.

show abstract

Transparent runtime parallelization of the R scripting language

Cited by 11 publications

References 26 publications

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Automatic Parallelization of Array-oriented Programs for a Multi-core Machine

Cumulon

Contact Info

Product

Resources

About