2018 IEEE International Conference on Cluster Computing (CLUSTER) 2018
DOI: 10.1109/cluster.2018.00087
|View full text |Cite
|
Sign up to set email alerts
|

swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
13
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(14 citation statements)
references
References 12 publications
0
13
0
1
Order By: Relevance
“…A er initialization, the traces in a CDP are processed in sequence (line 9) and the data halfpoints is prefetched before a new trace is processed (line 10-12). For the current trace, the memory addresses of the data accesses are calculated for each sample-NMO velocity pair and kept in the k1 array (line [13][14][15]. en, the maximum and minimum memory address in k1 array is identi ed (line [16][17][18] and used to determine the memory range (len th) of data accesses (line 19).…”
Section: 32mentioning
confidence: 99%
“…A er initialization, the traces in a CDP are processed in sequence (line 9) and the data halfpoints is prefetched before a new trace is processed (line 10-12). For the current trace, the memory addresses of the data accesses are calculated for each sample-NMO velocity pair and kept in the k1 array (line [13][14][15]. en, the maximum and minimum memory address in k1 array is identi ed (line [16][17][18] and used to determine the memory range (len th) of data accesses (line 19).…”
Section: 32mentioning
confidence: 99%
“…To guide the performance model, swDNN was designed for supporting efficient CNN implementation on Sunway TaihuLight, combining computing and memory resources together (Fang et al 2017). The later swCaffe, equipped with swDNN with Caffe, is the first deep learning frameworks for this supercomputer and obtains 4X speedup for the complete training process of the VGG-16 network (Li et al 2018). swCaffe on SW26010 has nearly half the performance of K40m in single precision and have 1.8× speedup over K40m in double precision.…”
Section: Large Scale Deep Learning On Sunway Taihulightmentioning
confidence: 99%
“…Performing reduction tree operations is thus both more efficient and scalable than the traditional parameter server approach. Several prior works [8], [20]- [23] all implement 'allreduce' operations, customized by cluster interconnect features, to optimize the transmission process.…”
Section: Communication Optimizationmentioning
confidence: 99%
“…In order to reduce the I/O overhead, S-Caffe [29] provides DL frameworks with parallel reading capabilities in order to take advantage of parallel file systems such as Lustre. swCaffe [20] improves the aggregated bandwidth of disk arrays by adjusting the data layout, enabling it to better fit the hardware architecture.…”
Section: I/o Optimizationmentioning
confidence: 99%