Research and optimization of intra-node communication operator performance based on domestic heterogeneous platforms
Zhe Ming,
Guangyu Ding,
Mengzhi Han
Abstract:With the increasing demand for computing power in machine learning tasks, the training of deep neural network models has been pushed to multi-GPU training or even larger scale distributed training. However, the acceleration effect and scalability of model training are largely limited by the communication efficiency between GPUs. In order to improve the communication efficiency of domestic GPU accelerator, this paper studies and analyzes the communication performance of Allreduce operator widely used in deep le… Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.