2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2017
DOI: 10.1109/ipdps.2017.20
|View full text |Cite
|
Sign up to set email alerts
|

swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
36
0
2

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 71 publications
(38 citation statements)
references
References 7 publications
0
36
0
2
Order By: Relevance
“…Optimizations on Sunway architecture. There're a few works exploiting architectural features on Sunway, e.g., heterogeneous computing cores, SIMD, register-level communication, SPM, and so on, which are either hand-tuned application-specific implementations [3,8,19,61], or domain-specific frameworks [18,36,75]. Specially, [38,62,76] perform hand-tuned tiling for parallelism.…”
Section: Related Workmentioning
confidence: 99%
“…Optimizations on Sunway architecture. There're a few works exploiting architectural features on Sunway, e.g., heterogeneous computing cores, SIMD, register-level communication, SPM, and so on, which are either hand-tuned application-specific implementations [3,8,19,61], or domain-specific frameworks [18,36,75]. Specially, [38,62,76] perform hand-tuned tiling for parallelism.…”
Section: Related Workmentioning
confidence: 99%
“…Each CPE has 16 KB L1 instruction cache and 64 KB local data memory (LDM) which can be configured as user-controlled fast buffer. A performance model based on the three-level (REG-LDM-MEM) memory hierarchy was proposed (Fang et al, 2017). The CPE can either directly access the global memory with a limited bandwidth of 8GB/s, or through a REG-LDM-MEM memory hierarchy to obtain much higher bandwidth.…”
Section: The Sw26010 Processormentioning
confidence: 99%
“…The Sunway Taihulight is a Chinese home-grown supercomputing system with 40960 compute nodes, providing a peak performance of 125Pflops and a sustained Linpack performance of 93PFlops, and ever ranked No.1 on the Top-500 supercomputers in the past two years. There are abundant studies on this platform containing atmospheric dynamics, earthquake simulation, deep learning and molecular dynamics and two of them won the ACM Gorden Bell prizes in 2016 and 2017 (Yang, 2016;Fu et al, 2017;Fang et al, 2017;Duan, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Compared with recent research on regular problems on Sunway architecture, such as stencil [3], DNN [12], GEMM [18,27] and fully-implicit solver for nonhydrostatic atmospheric dynamics [66], our work presented in this paper is more complicated, as the irregularities from various matrix sparsity structures are dynamic and leveraging such irregularity is known to be more challenging [57,67].…”
Section: Related Workmentioning
confidence: 99%