2015
DOI: 10.1016/j.future.2014.12.010
|View full text |Cite
|
Sign up to set email alerts
|

Picos: A hardware runtime architecture support for OmpSs

Abstract: OmpSs is a programming model that provides a simple and powerful way of annotating sequential programs to exploit heterogeneity and task parallelism based on runtime data dependency analysis, dataflow scheduling and out-of-order task execution; it has greatly influenced Version 4.0 of the OpenMP standard. The current implementation of OmpSs achieves those capabilities with a puresoftware runtime library: Nanos++. Therefore, although powerful and easy to use, the performance benefits of exploiting finegrained (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
43
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
2
1

Relationship

4
2

Authors

Journals

citations
Cited by 19 publications
(43 citation statements)
references
References 10 publications
0
43
0
Order By: Relevance
“…The Picos design [4] manages dependence analysis and task scheduling in hardware. (1) It reads new tasks with dependences and inserts them as a node in the task dependence graph in hardware; (2) It determines if a task is ready-toexecute and schedules it to the threads; (3) It reads finished execution tasks and updates the task dependence graph.…”
Section: B Main Idea and Implementationmentioning
confidence: 99%
See 2 more Smart Citations
“…The Picos design [4] manages dependence analysis and task scheduling in hardware. (1) It reads new tasks with dependences and inserts them as a node in the task dependence graph in hardware; (2) It determines if a task is ready-toexecute and schedules it to the threads; (3) It reads finished execution tasks and updates the task dependence graph.…”
Section: B Main Idea and Implementationmentioning
confidence: 99%
“…Previous design exploration with 24 threads and no software integration, proved to have great scalability for up to 21x speedup for Cholesky [2]. Moreover, by using a software cycle-level simulator [4] it has been measured that the same design with a larger size is able to manage up to 256 workers.…”
Section: Scalability and Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, for fine-grained tasks, the overhead of a software-only implementation including task creation, dependence management, task scheduling, etc., is simply too high to allow it to maintain a scalable performance [7]. Figure 1 shows the Fig.…”
Section: Introductionmentioning
confidence: 99%
“…However, its straightforward hardware implementation presented unresolved deadlocks due to queue saturation and memory capacity. A new design called Picos [7] was proposed and simulated with a C simulator to improve Task Superscalar by resolving these deadlocks and adding support for nested tasks. In this paper, we present a hardware accelerator for task and dependence management of fine-grained tasks for task-based programming models.…”
Section: Introductionmentioning
confidence: 99%