2020
DOI: 10.1109/tpds.2020.2978045
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

Abstract: As many-core accelerators keep integrating more processing units, it becomes increasingly more difficult for a parallel application to make effective use of all available resources. An effective way for improving hardware utilization is to exploit spatial and temporal sharing of the heterogeneous processing units by multiplexing computation and communication tasks -a strategy known as heterogeneous streaming. Achieving effective heterogeneous streaming requires carefully partitioning hardware among tasks, and … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 68 publications
0
12
0
Order By: Relevance
“…This approach can avoid the pitfalls of using a hard-wired heuristic that requires human modification every time when the architecture evolves, where the number and the type of cores are likely to change from one generation to the next. Experimental results XeonPhi and GPGPUs have shown that this approach can achieve over 93% of the Oracle performance (Zhang et al 2020).…”
Section: Figmentioning
confidence: 99%
See 2 more Smart Citations
“…This approach can avoid the pitfalls of using a hard-wired heuristic that requires human modification every time when the architecture evolves, where the number and the type of cores are likely to change from one generation to the next. Experimental results XeonPhi and GPGPUs have shown that this approach can achieve over 93% of the Oracle performance (Zhang et al 2020).…”
Section: Figmentioning
confidence: 99%
“…Researchers have also exploited the machine learning techniques to automatically construct a predictive model to directly predict the best configuration (Zhang et al 2018a(Zhang et al , 2020. This approach provides minimal runtime, and has little development overhead when targeting a new manycore architecture.…”
Section: Machine-learning Based Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…On heterogeneous many-core architectures, [38] presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications, in order to exploit spatial and temporal sharing of the heterogeneous processing units. [28] presents a runtime system that automatically optimizes data management on SPM to achieve performance similar to that on the fast memory-only system with a much smaller capacity of fast memory.…”
Section: Data Transfer Optimizationmentioning
confidence: 99%
“…Compared to supervised-learning methods [? ], [23], [24], [25], [33], [34], [35], [36], [37], our RL-based solution has the benefit of not requiring labelled a large number of training samples to train the model. Obtaining sufficient and representative training samples to cover a diverse set of workloads seen in deployment have been shown to be difficult [38], [39], [40], [41].…”
Section: Introductionmentioning
confidence: 99%