Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution

Matějka, Joel; Forsberg, Björn; Sójka, Michał; Hanzálek, Zdeněk; Benini, Luca; Marongiu, Andrea

doi:10.1145/3178442.3178444

Cited by 9 publications

(13 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, two memory phases are considered: an acquisition (or load) phase that copies data and instructions from main memory into local memory, and a replication (or unload) phase that copies modified data back to main memory. While the computation phase is always executed on a processor, the memory phases can be either executed on the processor itself [5,6,13,22,26,49,50,53,56,71,72], or on another hardware component [30,31], such as a programmable Direct Memory Access (DMA) module [7,20,61,66]. Works that proposed using a DMA unit to perform the memory transfers [66] can efficiently hide the memory latency by overlapping the execution of a task with the DMA transfer of another task; this leads to considerable improvements in schedulability.…”

Section: Software Solutionsmentioning

confidence: 99%

Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms

Aghilinasab

Ali

Yun

et al. 2020

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

Section: Software Solutionsmentioning

confidence: 99%

Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms

Aghilinasab

Ali

Yun

et al. 2020

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…4.2). Then, we briefly summarize the ILP model from our previous work [4] (Sec. 4.3) and finally introduce the new heuristic (Sec.…”

Section: Schedulingmentioning

confidence: 99%

“…In our previous work [4] we have presented a prototype compiler -capable of transforming regular loops into PREM-compliant code -coupled to a scheduling tool based on an ILP model and capable of optimally scheduling small task graphs. In this paper we significantly extend our previous work along several axes.…”

Section: Introductionmentioning

confidence: 99%

Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution

et al. 2019

Self Cite

View full text Add to dashboard Cite

Many applications require both high performance and predictable timing. High-performance can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these systems share main memory, they are susceptible to interference from each other, which is a problem for timing predictability. We achieve predictability on multi-cores by employing the predictable execution model (PREM), which splits execution into a sequence of memory and compute phases, and schedules these such that only a single core is executing a memory phase at a time.We present a toolchain consisting of a compiler and a scheduling tool. Our compiler uses region and loop based analysis and performs tiling to transform application code into PREM-compliant binaries. In addition to enabling predictable execution, the compiler transformation optimizes accesses to the shared main memory.The scheduling tool uses a state-of-the-art heuristic algorithm and is able to schedule industrial-size instances. For smaller instances, we compare the results of the algorithm with optimal solutions found by solving an Integer Linear Programming model. Furthermore, we solve the problem of scheduling execution on multiple cores while preventing interference of memory phases.We evaluate our toolchain on Advanced Driver Assistance System (ADAS) application workloads running on an NVIDIA Tegra X1 embedded system-on-chip (SoC). The results show that our approach maintains similar average performance to the original (unmodified) program code and execution, while reducing variance of completion times by a factor of 9 with the identified optimal solutions and by a factor of 5 with schedules generated by our heuristic scheduler.

show abstract

“…The authors of [11] proposed a technique for compiling a GPU kernel into PREM-compliant code. In [17], authors present a compiler based on the LLVM infrastructure that refactors legacy code into PREM code.…”

Section: Related Workmentioning

confidence: 99%

“…Another solution is to rely on a compiler that automates phases separation. For instance, PREM-compliant compilation for the LLVM framework has been proposed in [11,17]. Our approach also tackles PREMcompliant C code generation but starts from a higher level of abstraction than previous approaches.…”

Section: Introductionmentioning

confidence: 99%

Code generation for multi-phase tasks on a multi-core distributed memory platform

Fort¹,

Forget²

2019

2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)

View full text Add to dashboard Cite

To cite this version:Frédéric Fort, Julien Forget. Code generation for multi-phase tasks on a multi-core distributed memory platform. AbstractEnsuring temporal predictability of real-time systems on a multi-core platform is difficult, mainly due to hard to predict delays related to shared access to the main memory. Task models where computation phases and communication phases are separated (such as the PRedictable Execution Model [23]), have been proposed to both mitigate these delays and make them easier to analyze.In this paper we present a compilation process, part of the Prelude compiler [20], that automatically translates a high-level synchronous dataflow system specification into a PREM-compliant C program. By automating the production of the PREM-compliant C code, low-level implementation concerns related to task communications become the responsibility of the compiler, which saves tedious and error-prone development efforts.

show abstract

Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution

Cited by 9 publications

References 9 publications

Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms

Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms

Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution

Code generation for multi-phase tasks on a multi-core distributed memory platform

Contact Info

Product

Resources

About