Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

Zhang, Peng; Fang, Jianbin; Yang, Chi; Huang, Chun; Tang, Tao; Wang, Zheng

doi:10.1109/tpds.2020.2978045

Cited by 19 publications

(12 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach can avoid the pitfalls of using a hard-wired heuristic that requires human modification every time when the architecture evolves, where the number and the type of cores are likely to change from one generation to the next. Experimental results XeonPhi and GPGPUs have shown that this approach can achieve over 93% of the Oracle performance (Zhang et al 2020).…”

Section: Figmentioning

confidence: 99%

“…Researchers have also exploited the machine learning techniques to automatically construct a predictive model to directly predict the best configuration (Zhang et al 2018a(Zhang et al , 2020. This approach provides minimal runtime, and has little development overhead when targeting a new manycore architecture.…”

Section: Machine-learning Based Modelsmentioning

confidence: 99%

“…There are many studies showing it outperforms human-based approaches. Recent work shows that it is effective in performing parallel code optimization (Chen et al 2020;Cummins et al 2017a, b;Grewe et al 2013b;Ogilvie et al 2014;Wang et al 2014aWang et al , 2015, performance predicting (Wang and O'Boyle 2013;Zhao et al 2016), parallelism mapping (Grewe et al 2013a;Taylor et al 2017;Tournavitis et al 2009;Wang and O'Boyle 2010;Wang et al 2014bWang et al , 2015Wen et al 2014;Zhang et al 2020), and task scheduling (Emani et al 2013;Marco et al 2017;Ren et al 2017Ren et al , 2018Ren et al , 2020Sanz Marco et al 2019;Yuan et al 2019). As the many-core design becomes increasingly diverse, we believe that the machinelearning techniques provide a rigorous, automatic way for constructing optimization heuristics, which is more scalable and sustainable, compared to manually-crafted solutions.…”

Section: A Vision For the Next Decadementioning

confidence: 99%

See 2 more Smart Citations

Parallel programming models for heterogeneous many-cores: a comprehensive survey

et al. 2020

Self Cite

View full text Add to dashboard Cite

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability. We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices. We provide a road map for a wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements.

show abstract

Section: Figmentioning

confidence: 99%

Section: Machine-learning Based Modelsmentioning

confidence: 99%

Section: A Vision For the Next Decadementioning

confidence: 99%

See 1 more Smart Citation

Parallel programming models for heterogeneous many-cores: a comprehensive survey

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…On heterogeneous many-core architectures, [38] presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications, in order to exploit spatial and temporal sharing of the heterogeneous processing units. [28] presents a runtime system that automatically optimizes data management on SPM to achieve performance similar to that on the fast memory-only system with a much smaller capacity of fast memory.…”

Section: Data Transfer Optimizationmentioning

confidence: 99%

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Tao¹,

Pang²,

Jiang³

et al. 2021

J Supercomput

View full text Add to dashboard Cite

The heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.

show abstract

“…Compared to supervised-learning methods [? ], [23], [24], [25], [33], [34], [35], [36], [37], our RL-based solution has the benefit of not requiring labelled a large number of training samples to train the model. Obtaining sufficient and representative training samples to cover a diverse set of workloads seen in deployment have been shown to be difficult [38], [39], [40], [41].…”

Section: Introductionmentioning

confidence: 99%

Online Power Management for Multi-Cores: A Reinforcement Learning Based Approach

Wang

Zhang

Hao

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

Power and energy is the first-class design constraint for multi-core processors and is a limiting factor for future-generation supercomputers. While modern processor design provides a wide range of mechanisms for power and energy optimization, it remains unclear how software can make the best use of them. This paper presents a novel approach for runtime power optimization on modern multi-core systems. Our policy combines power capping and uncore frequency scaling to match the hardware power profile to the dynamically changing program behavior at runtime. We achieve this by employing reinforcement learning (RL) to automatically explore the energy-performance optimization space from training programs, learning the subtle relationships between the hardware power profile, the program characteristics, power consumption and program running times. Our RL framework then uses the learned knowledge to adapt the chip's power budget and uncore frequency to match the changing program phases for any new, previously unseen program. We evaluate our approach on two computing clusters by applying our techniques to 11 parallel programs that were not seen by our RL framework at the training stage. Experimental results show that our approach can reduce the system-level energy consumption by 12%, on average, with less than 3% of slowdown on the application performance. By lowering the uncore frequency to leave more energy budget to allow the processor cores to run at a higher frequency, our approach can reduce the energy consumption by up to 17% while improving the application performance by 5% for specific workloads.

show abstract

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

Cited by 19 publications

References 68 publications

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Online Power Management for Multi-Cores: A Reinforcement Learning Based Approach

Contact Info

Product

Resources

About