2018
DOI: 10.1145/3177885
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer

Abstract: The Sunway TaihuLight supercomputer is powered by SW26010, a new 260-core processor designed with onchip fusion of heterogeneous cores. In this article, we present our work on optimizing the training process of convolutional neural networks (CNNs) on the Sunway TaihuLight supercomputer. Specifically, a highly efficient library (swDNN) and a customized Caffe framework (swCaffe) are proposed. Architecture-oriented optimization methods targeting the many-core architecture of SW26010 are introduced and are able to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0
2

Year Published

2018
2018
2022
2022

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 16 publications
0
13
0
2
Order By: Relevance
“…In the U pdate step, we first accumulate the c l j and count l j of all CPEs by performing two AllReduce operations, so that all CPEs can obtain the assignment results of the whole input dataset. We use register communication [43] to implement intra-CG AllReduce operation and use MPI AllReduce for inter-CG AllReduce. After the accumulation, the U pdate step is performed to calculate new centroids, as shown in line 15.…”
Section: Level 1 -Dataflow Partitionmentioning
confidence: 99%
“…In the U pdate step, we first accumulate the c l j and count l j of all CPEs by performing two AllReduce operations, so that all CPEs can obtain the assignment results of the whole input dataset. We use register communication [43] to implement intra-CG AllReduce operation and use MPI AllReduce for inter-CG AllReduce. After the accumulation, the U pdate step is performed to calculate new centroids, as shown in line 15.…”
Section: Level 1 -Dataflow Partitionmentioning
confidence: 99%
“…There're a few works exploiting architectural features on Sunway, e.g., heterogeneous computing cores, SIMD, register-level communication, SPM, and so on, which are either hand-tuned application-specific implementations [3,8,19,61], or domain-specific frameworks [18,36,75]. Specially, [38,62,76] perform hand-tuned tiling for parallelism.…”
Section: Related Workmentioning
confidence: 99%
“…Performing reduction tree operations is thus both more efficient and scalable than the traditional parameter server approach. Several prior works [8], [20]- [23] all implement 'allreduce' operations, customized by cluster interconnect features, to optimize the transmission process.…”
Section: Communication Optimizationmentioning
confidence: 99%