2019
DOI: 10.1109/tpds.2019.2907493
|View full text |Cite
|
Sign up to set email alerts
|

A Hardware Runtime for Task-Based Programming Models

Abstract: Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only runtimes introduce relevant overhead when targeting fine-grained tasks, resulting in performance losses. To overcome this drawback, we present a hardware runtime Picos++ that accelerates critical runtime functions such as task dependence analysis, nested task support, and heterogeneous task s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 16 publications
(26 citation statements)
references
References 22 publications
0
17
0
Order By: Relevance
“…By encapsulating the loop body and loads/stores in different functions, the HLS compiler is able to schedule calls without dependencies in the same cycle. In the example [8], a2 [8]; float b1 [8], b2 [8]; float c1 [8], c2 [8]; load(a1,b1,a,b); //n is multiple of 2 and n >= 2 for (int k = 0; k < n-2; ++k) { loadStore(a2,b2,c2,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a1,b1,c1); ++k; loadStore(a1,b1,c1,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a2,b2,c2); } int k = n-1; loadStore(a2,b2,c2,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a1,b1,c1); store(c1,c+k * 8); loopBody(a2,b2,c2); store(c1,c+k * 8); } Listing 3: Proposal of OmpSs pragma syntax (vectorAdd) and generated Vivado HLS code (vectorAddTransformed) to pipeline loads/stores with computation of listing 3, the first loadStore function call of vectorAddTransformed is scheduled alongside the first loopBody call. The other two calls are also scheduled together after the first two.…”
Section: Compiler Transformationsmentioning
confidence: 99%
“…By encapsulating the loop body and loads/stores in different functions, the HLS compiler is able to schedule calls without dependencies in the same cycle. In the example [8], a2 [8]; float b1 [8], b2 [8]; float c1 [8], c2 [8]; load(a1,b1,a,b); //n is multiple of 2 and n >= 2 for (int k = 0; k < n-2; ++k) { loadStore(a2,b2,c2,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a1,b1,c1); ++k; loadStore(a1,b1,c1,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a2,b2,c2); } int k = n-1; loadStore(a2,b2,c2,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a1,b1,c1); store(c1,c+k * 8); loopBody(a2,b2,c2); store(c1,c+k * 8); } Listing 3: Proposal of OmpSs pragma syntax (vectorAdd) and generated Vivado HLS code (vectorAddTransformed) to pipeline loads/stores with computation of listing 3, the first loadStore function call of vectorAddTransformed is scheduled alongside the first loopBody call. The other two calls are also scheduled together after the first two.…”
Section: Compiler Transformationsmentioning
confidence: 99%
“…Picos [18,20,24] is the module responsible for providing fast Task Scheduling functionality. Its communication interface includes queues for (1) receiving information about new tasks to be added to the task graph, called submission queue;…”
Section: Picosmentioning
confidence: 99%
“…As a result, several research groups have sought to improve the maximum throughput of Task Scheduling systems by resorting to hardware acceleration, leading to largely successful designs [8,18,20,24]. For example, the Picos [20] Task Scheduling accelerator was proven capable of significantly improving the performance of task parallel programs.…”
Section: Introductionmentioning
confidence: 99%
“…More details about Picos might be found in related publications [Tan, 2018, Tan et al, 2017, Yazdanpanah et al, 2015.…”
Section: Picosmentioning
confidence: 99%
“…As a result, several research groups have sought to improve the maximum throughput of Task Scheduling systems by resorting to hardware accelerators (e.g. FPGA), leading to largely successful designs [Dallou and Juurlink, 2012, Tan, 2018, Tan et al, 2017, Yazdanpanah et al, 2015.…”
Section: Introductionmentioning
confidence: 99%