Wenlai Zhao scite author profile

The Sunway TaihuLight supercomputer is powered by SW26010, a new 260-core processor designed with onchip fusion of heterogeneous cores. In this article, we present our work on optimizing the training process of convolutional neural networks (CNNs) on the Sunway TaihuLight supercomputer. Specifically, a highly efficient library (swDNN) and a customized Caffe framework (swCaffe) are proposed. Architecture-oriented optimization methods targeting the many-core architecture of SW26010 are introduced and are able to achieve 48× speedup for the convolution routine in swDNN and 4× speedup for the complete training process of the VGG-16 network using swCaffe, compared to the unoptimized algorithm and framework. Compared to the cuDNN library and the Caffe framework based on the NVIDIA K40m GPU, the proposed swDNN library and swCaffe framework on SW26010 have nearly half the performance of K40m in single-precision and have 3.6× and 1.8× speedup over K40m in double precision, respectively. CCS Concepts: • Computing methodologies → Neural networks; • Computer systems organization → Multicore architectures;

show abstract

swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

Fang

et al. 2018

View full text Add to dashboard Cite

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Zhao

Liu

et al. 2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

This paper presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wenlai Zhao

swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight

Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer

swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Contact Info

Product

Resources

About