Proceedings of the 51st Annual Design Automation Conference 2014
DOI: 10.1145/2593069.2593198
|View full text |Cite
|
Sign up to set email alerts
|

Reduction Operator for Wide-SIMDs Reconsidered

Abstract: It has been shown that wide Single Instruction Multiple Data architectures (wide-SIMDs) can achieve high energy efficiency, especially in domains such as image and vision processing. In these and various other application domains, reduction is a frequently encountered operation, where multiple input elements need to be combined into a single element by an associative operation, e.g. addition or multiplication. There are many applications that require reduction such as: partial histogram merging, matrix multipl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…For example, partial histogram merging, row projection (sub kernels in the FFoS application described in Section 3.2), Find Maximal Element in a Vector, and Sum of Vector Elements (categorized as global-to-point kernels in Section 3.2). To efficiently handle these kernels on a wide SIMD with only a circular neighborhood network, we introduced two novel reduction algorithms, pipelined reduction and diagonal access reduction, which do not rely on complex communication networks or any dedicated hardware [29]. The key idea of both approaches is to utilize inter-vector parallelism instead of intra-vector parallelism.…”
Section: Left Pementioning
confidence: 99%
See 3 more Smart Citations
“…For example, partial histogram merging, row projection (sub kernels in the FFoS application described in Section 3.2), Find Maximal Element in a Vector, and Sum of Vector Elements (categorized as global-to-point kernels in Section 3.2). To efficiently handle these kernels on a wide SIMD with only a circular neighborhood network, we introduced two novel reduction algorithms, pipelined reduction and diagonal access reduction, which do not rely on complex communication networks or any dedicated hardware [29]. The key idea of both approaches is to utilize inter-vector parallelism instead of intra-vector parallelism.…”
Section: Left Pementioning
confidence: 99%
“…The experimental results show that using the proposed algorithms, the performance is comparable to the performance when dedicated reduction hardware is equipped. For details please refer to the work of L.Waeijen et al [29].…”
Section: Left Pementioning
confidence: 99%
See 2 more Smart Citations