Diego R. Llanos scite author profile

With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory system. Software schemes require no changes to the hardware of existing shared-memory systems, but can suffer from significant overheads involved with the speculative execution. In fact, the performance of software schemes is highly dependent on application characteristics, the design and implementation of the scheme, and the system configuration and size. This paper explores the design space of a recently proposed software speculative parallelization scheme. In the process, we gain insight into the most beneficial features of software schemes for speculative parallelization, as well as the most influential application characteristics. For instance, experimental results show that, contrary to intuition, checking for data dependence violations on every speculative store, as opposed to at commit time, leads to little performance degradation in the worst case and to significantly better performance with large configurations. Also, scheduling policies based on windows can perform very close to fully dynamic policies with a fraction of the memory overhead. Finally, experimental results show consistent speedups in the execution of loops that cannot be parallelized at compile time, both with and without RAW data dependences, for 4 to 32 processors.

show abstract

Toward efficient and robust software speculative parallelization on multiprocessors

Cintra

Llanos

2003

SIGPLAN Not.

View full text Add to dashboard Cite

With speculative parallelization, code sections that cannot be fully analyzed by the compiler are aggressively executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and memory system. Software schemes require no extra hardware but can be inefficient.This paper proposes a new software-only speculative parallelization scheme. The scheme is developed after a systematic evaluation of the design options available and is shown to be efficient and robust and to outperform previously proposed schemes. The novelty and performance advantage of the scheme stem from the use of carefully tuned data structures, synchronization policies, and scheduling mechanisms. Experimental results show that our scheme has small overheads and, for applications with few or no data dependence violations, realizes on average 71% of the speedup of a manually parallelized version of the code, outperforming two recently proposed software-only speculative parallelization schemes. For applications with many data dependence violations, our performance monitors and switches can effectively curb the performance degradation.

show abstract

An Extensible System for Multilevel Automatic Data Partition and Mapping

González-Escribano

Torres

Fresno

et al. 2014

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-Automatic data distribution is a key feature to obtain efficient implementations from abstract and portable parallel codes. We present a highly efficient and extensible runtime library that integrates techniques for automatic data partition and mapping. It uses a novel approach to define an abstract interface and a plug-in system to encapsulate different types of regular and irregular techniques, helping to generate codes which are independent of the exact mapping functions selected. Currently, it supports hierarchical tiling of arrays with dense and stride domains, that allows the implementation of both data and task parallelism using a SPMD model. It automatically computes appropriate domain partitions for a selected virtual topology, mapping them to available processors with static or dynamic load-balancing techniques. Our library also allows the construction of reusable communication patterns that efficiently exploit MPI communication capabilities. The use of our library greatly reduces the complexity of data distribution and communication, hiding the details of the underlying architecture. The library can be used as an abstract layer for building generic tiling operations as well. Our experimental results show that the use of this library allows to achieve similar performance as carefully-implemented manual versions for several, well-known parallel kernels and benchmarks in distributed and multicore systems, and substantially reduces programming effort.

show abstract

A new GPU-based approach to the Shortest Path problem

Ortega–Arranz

Torres

Llanos

et al. 2013

View full text Add to dashboard Cite

The Single-Source Shortest Path (SSSP) problem arises in many different fields. In this paper we present a GPUbased version of the Crauser et al. SSSP algorithm. Our work significantly speeds up the computation of the SSSP, not only with respect to the CPU-based version, but also to other state-ofthe-art GPU implementation based on Dijkstra, due to Martín et al. Both GPU implementations have been evaluated using the last Nvidia architecture (Kepler). Our experimental results show that the new GPU-Crauser algorithm leads to speed-ups from 13× to 220× with respect to the CPU version and a performance gain of up to 17% with respect the GPU-Martín algorithm.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Diego R. Llanos

Toward efficient and robust software speculative parallelization on multiprocessors

Design space exploration of a software speculative parallelization scheme

Toward efficient and robust software speculative parallelization on multiprocessors

An Extensible System for Multilevel Automatic Data Partition and Mapping

A new GPU-based approach to the Shortest Path problem

Contact Info

Product

Resources

About