2020 IEEE Workshop on Signal Processing Systems (SiPS) 2020
DOI: 10.1109/sips50750.2020.9195250
|View full text |Cite
|
Sign up to set email alerts
|

Programming Heterogeneous CPU-GPU Systems by High-Level Dataflow Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 14 publications
0
10
0
Order By: Relevance
“…Due to the dynamic aspect of the second example, this specialization of the FIFO buffer could not be utilized as all buffers had to be HostFifo so that the dynamic switching between the actor and its shadow variant could occur. For more details on the differences between these FIFO buffer implementations, please refer to [8].…”
Section: Compile 4 Executementioning
confidence: 99%
See 1 more Smart Citation
“…Due to the dynamic aspect of the second example, this specialization of the FIFO buffer could not be utilized as all buffers had to be HostFifo so that the dynamic switching between the actor and its shadow variant could occur. For more details on the differences between these FIFO buffer implementations, please refer to [8].…”
Section: Compile 4 Executementioning
confidence: 99%
“…To generalize the methodology for additional use with platforms including GPU hardware, the source-to-source compiler backend that generated specialized software code for dataflow application programs, as presented in [8][9][10], would require further expansion. According to the results of this study, the automatic generation of a platform-specific instrumented code aggregating clock-accurate profiling metrics for CUDA/C++ [11] from a dataflow model of the application software was completed at a granularity level that would be particularly suitable for TURNUS's analysis.…”
Section: Introductionmentioning
confidence: 99%
“…The tool flow is presented in Figure 2. The high-level representation of the application program written in RVC-CAL, together with configuration files providing partitioning and buffer sizes information, are fed to the ORCC compiler, which uses the Exelixi CUDA backend [48], [49] to generate the C++/CUDA code that is then compiled with the Nvidia CUDA Compiler (NVCC) to obtain an executable of the heterogeneous program. Using a platform-specific compiler as the last layer of the tool-chain allows the methodology to be compatible with all Nvidia supported platforms (i.e., X86(_64), ARM, POWER9, and all Nvidia GPUs).…”
Section: B Partition and Mappingmentioning
confidence: 99%
“…Regarding performance, Table 1 summarizes two different sets of results. The first one is when the idct2d actor runs on the GPU sequentially (this corresponds to the methodology presented in [48]), all other actors are running on the CPU. The second one corresponds to the improved methodology where the idct2d actor runs in parallel on the GPU.…”
Section: ) Rvc-cal Jpeg Decodermentioning
confidence: 99%
“…The second is the extension of both the design space exploration model defined by the authors of this work and the extension of the open-source toolbox capable of synthesizing low-level code for heterogeneous CPU and GPU platforms. To this end, the methodology already defined in [8] was significantly extended allowing automatically synthesizing a C++/CUDA parallel version for every actors' actions, all taking full advantage of SIMD parallelization techniques. All the innovative contributions of this article can be summarized as follows:…”
Section: Introductionmentioning
confidence: 99%