2016
DOI: 10.1145/2851497
|View full text |Cite
|
Sign up to set email alerts
|

Parallelizing the Chambolle Algorithm for Performance-Optimized Mapping on FPGA Devices

Abstract: The performance and the efficiency of recent computing platforms have been deeply influenced by the widespread adoption of hardware accelerators, such as graphics processing units (GPUs) or fieldprogrammable gate arrays (FPGAs), which are often employed to support the tasks of general-purpose processors (GPPs). One of the main advantages of these accelerators over their sequential counterparts (GPPs) is their ability to perform massive parallel computation. However, to exploit this competitive edge, it is nece… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 27 publications
(33 reference statements)
0
3
0
Order By: Relevance
“…Our key insight is that the impact of a single input element can be computed by repeatedly applying the original stencil to an impulse signal. 2 Concretely, the impulse array has the value one at its middle position and zeros at all other positions. The length of the impulse array depends on the radius of the stencil pattern and the number of iterations to harvest parallelism from.…”
Section: Computing Dce Coefficientsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our key insight is that the impact of a single input element can be computed by repeatedly applying the original stencil to an impulse signal. 2 Concretely, the impulse array has the value one at its middle position and zeros at all other positions. The length of the impulse array depends on the radius of the stencil pattern and the number of iterations to harvest parallelism from.…”
Section: Computing Dce Coefficientsmentioning
confidence: 99%
“…The cone-based ISL acceleration approach is the most similar approach to DCMI described in the literature. We have chosen CA [2,29,40] as the representative of this class as they share our goal of providing an automatic design flow. Zohouri et al [60] takes an approach that is architecturally similar to CA but realized through OpenCL.…”
Section: Fpga-based Isl Acceleratorsmentioning
confidence: 99%
“…Different implementations exist on CPU namely the original [10] and improved [11] versions, a parallel OpenMP version [18] and a SIMD version [19]. FPGA implementations have also been developed [20] and have been optimised in memory allocation and power consumption [21]. Finally, GPU implementations have been developed for the original [10], improved [11] and further optimised TV-L1 versions [22], [23].…”
Section: Introductionmentioning
confidence: 99%