Picos: A hardware runtime architecture support for OmpSs

Yazdanpanah, Fahimeh; Álvarez, Carlos; Jiménez-González, Daniel; Badía, Rosa M.; Valero, Mateo

doi:10.1016/j.future.2014.12.010

Cited by 19 publications

(43 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Picos design [4] manages dependence analysis and task scheduling in hardware. (1) It reads new tasks with dependences and inserts them as a node in the task dependence graph in hardware; (2) It determines if a task is ready-toexecute and schedules it to the threads; (3) It reads finished execution tasks and updates the task dependence graph.…”

Section: B Main Idea and Implementationmentioning

confidence: 99%

“…Previous design exploration with 24 threads and no software integration, proved to have great scalability for up to 21x speedup for Cholesky [2]. Moreover, by using a software cycle-level simulator [4] it has been measured that the same design with a larger size is able to manage up to 256 workers.…”

Section: Scalability and Future Workmentioning

confidence: 99%

“…Our work on hardware task dependence graph management has showed great scalability and performance improvement over its software-only alternatives [2], [4], [5].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Picos, A Hardware Task-Dependence Manager for Task-Based Dataflow Programming Models

Tan

Bosch

Vidal

et al. 2017

2017 International Conference on High Performance Computing &Amp; Simulation (HPCS)

Self Cite

View full text Add to dashboard Cite

Abstract-Task-based programming Task-based programming models such as OpenMP, Intel TBB and OmpSs are widely used to extract high level of parallelism of applications executed on multi-core and manycore platforms. These programming models allow applications to be expressed as a set of tasks with dependences to drive their execution at runtime. While managing these dependences for task with coarse granularity proves to be highly beneficial, it introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we propose a hardware/software co-design Picos that manages inter-task dependences efficiently. In this paper we describe the main ideas of our proposal and a prototype implementation. This prototype is integrated with a parallel taskbased programming model and evaluated with real executions in Linux embedded system with two ARM Cortex-A9 and a FPGA. When compared with a software runtime, our solution results in more than 1.8x speedup and 40% of energy savings with only 2 threads.

show abstract

Section: B Main Idea and Implementationmentioning

confidence: 99%

Section: Scalability and Future Workmentioning

confidence: 99%

See 1 more Smart Citation

Picos, A Hardware Task-Dependence Manager for Task-Based Dataflow Programming Models

Tan

Bosch

Vidal

et al. 2017

2017 International Conference on High Performance Computing &Amp; Simulation (HPCS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, for fine-grained tasks, the overhead of a software-only implementation including task creation, dependence management, task scheduling, etc., is simply too high to allow it to maintain a scalable performance [7]. Figure 1 shows the Fig.…”

Section: Introductionmentioning

confidence: 99%

“…However, its straightforward hardware implementation presented unresolved deadlocks due to queue saturation and memory capacity. A new design called Picos [7] was proposed and simulated with a C simulator to improve Task Superscalar by resolving these deadlocks and adding support for nested tasks. In this paper, we present a hardware accelerator for task and dependence management of fine-grained tasks for task-based programming models.…”

Section: Introductionmentioning

confidence: 99%

Performance analysis of a hardware accelerator of dependence management for task-based dataflow programming models

Tan

Bosch

Jiménez-González

et al. 2016

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Self Cite

View full text Add to dashboard Cite

Abstract-Along with the popularity of multicore and manycore, task-based dataflow programming models obtain great attention for being able to extract high parallelism from applications without exposing the complexity to programmers. One of these pioneers is the OpenMP Superscalar (OmpSs). By implementing dynamic task dependence analysis, dataflow scheduling and out-of-order execution in runtime, OmpSs achieves high performance using coarse and medium granularity tasks. In theory, for the same application, the more parallel tasks can be exposed, the higher possible speedup can be achieved. Yet this factor is limited by task granularity, up to a point where the runtime overhead outweighs the performance increase and slows down the application.To overcome this handicap, Picos was proposed to support task-based dataflow programming models like OmpSs as a fast hardware accelerator for fine-grained task and dependence management, and a simulator was developed to perform design space exploration. This paper presents the very first functional hardware prototype inspired by Picos. An embedded system based on a Zynq 7000 All-Programmable SoC is developed to study its capabilities and possible bottlenecks. Initial scalability and hardware consumption studies of different Picos designs are performed to find the one with the highest performance and lowest hardware cost. A further thorough performance study is employed on both the prototype with the most balanced configuration and the OmpSs software-only alternative. Results show that our OmpSs runtime hardware support significantly outperforms the software-only implementation currently available in the runtime system for fine-grained tasks.

show abstract