Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms

Choi, Wonje; Duraisamy, Karthi; Kim, Ryan; Doppa, Janardhan Rao; Pande, Partha Pratim; Mărculescu, Radu; Marculescu, Diana

doi:10.1145/2968455.2968510

Cited by 29 publications

(11 citation statements)

References 35 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…8, it is evident that even in this optimized NoC, a few links are more heavily utilized when compared to the rest of the links. In a previous work [49], it was demonstrated that links associated with MCs have nearly 500% higher traffic density than the overall average link utilization for the Rodinia backpropagation benchmark [50]. However, this backpropagation benchmark is much simpler than the workloads addressed in this work and only employs a single NN layer.…”

Section: Gpusmentioning

confidence: 92%

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

Choi

Duraisamy

Kim

et al. 2018

IEEE Trans. Comput.

Self Cite

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. To address this issue, we first analyze the on-chip traffic patterns that arise from the computational processes associated with training two deep CNN architectures, namely, LeNet and CDBNet, to perform image classification. By leveraging this knowledge, we design a hybrid Network-on-Chip (NoC) architecture, which consists of both wireline and wireless links, to improve the performance of CPU-GPU based heterogeneous manycore platforms running the above-mentioned CNN training workloads. The proposed NoC achieves 1.8× reduction in network latency and improves the network throughput by a factor of 2.2 for training CNNs, when compared to a highly-optimized wireline mesh NoC. For the considered CNN workloads, these network-level improvements translate into 25% savings in full-system energy-delay-product (EDP). This demonstrates that the proposed hybrid NoC for heterogeneous manycore architectures is capable of significantly accelerating training of CNNs while remaining energy-efficient.

show abstract

Section: Gpusmentioning

confidence: 92%

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

Choi

Duraisamy

Kim

et al. 2018

IEEE Trans. Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…To handle this challenge, Choi et al [42] has proposed a hybrid (wired+wirelss) NoC architecture for heterogeneous CMPs which specifically targets the training phase of DNNs. In the proposed architecture, as the CPU to the memory controller (MC) communications are latency-sensitive, this type of data exchange is carried out through the single-hop wireless interconnects.…”

Section: B Wireless Interconnectsmentioning

confidence: 99%

“…In recent years, the new interconnection techniques, such as 3D vertical on-chip interconnection [36]- [41], wireless interconnection [42]- [44], and optical interconnection [45], [46], etc., brought the performance revolution to DNN computing. As mentioned before, memory access latency dominates the overall DNN performance, which promotes the research about in/near-memory processing techniques.…”

Section: Introductionmentioning

confidence: 99%

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Nabavinejad

Baharloo

Chen

et al. 2020

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have shown significant advantages in many domains, such as pattern recognition, prediction, and control optimization. The edge computing demand in the Internet-of-Things (IoTs) era has motivated many kinds of computing platforms to accelerate DNN operations. However, due to the massive parallel processing, the performance of the current large-scale artificial neural network is often limited by the huge communication overheads and storage requirements. As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study. Currently, a large body of research aims to find an efficient on-chip interconnection to achieve low-power and high-bandwidth DNN computing. This paper provides a comprehensive investigation of the recent advances in efficient on-chip interconnection and design methodology of the DNN accelerator design. First, we provide an overview of the different interconnection methods on the DNN accelerator. Then, the interconnection methods on the non-ASIC DNN accelerator will be discussed. On the other hand, with the flexible interconnection, the DNN accelerator can support different computing flow, which increases the computing flexibility. With this motivation, reconfigurable DNN computing with flexible on-chip interconnection will be investigated in this paper. Finally, we investigate the emerging interconnection technologies (e.g., in/near-memory processing) for the DNN accelerator design. This paper systematically investigates the interconnection networks in modern DNN accelerator designs. With this article, the readers are able to: 1) understand

show abstract

“…NoC is usually used for speci c designed multiple-chips solutions. Choi et al [31] proposed a hybrid NoC architecture that combines CPUs and GPUs on a single chip for DL. Meanwhile, the hybrid network-on-chip architecture also introduces wireless links in CPUs and GPUs communication.…”

Section: Distributed Systemmentioning

confidence: 99%

Deep Learning for Mobile Multimedia

Ota

Dao

Mezaris

et al. 2017

ACM Trans. Multimedia Comput. Commun. Appl.

127

View full text Add to dashboard Cite

Deep Learning (DL) has become a crucial technology for multimedia computing. It o ers a powerful instrument to automatically produce high-level abstractions of complex multimedia data, which can be exploited in a number of applications including object detection and recognition, speech-to-text, media retrieval, multimodal data analysis, and so on. e availability of a ordable large-scale parallel processing architectures, and the sharing of e ective open-source codes implementing the basic learning algorithms, caused a rapid di usion of DL methodologies, bringing a number of new technologies and applications that outperform in most cases traditional machine learning technologies. In recent years, the possibility of implementing DL technologies on mobile devices has a racted signi cant a ention. anks to this technology, portable devices may become smart objects capable of learning and acting. e path towards these exciting future scenarios, however, entangles a number of important research challenges. DL architectures and algorithms are hardly adapted to the storage and computation resources of a mobile device. erefore, there is a need for new generations of mobile processors and chipsets, small footprint learning and inference algorithms, new models of collaborative and distributed processing, and a number of other fundamental building blocks. is survey reports the state of the art in this exciting research area, looking back to the evolution of neural networks, and arriving to the most recent results in terms of methodologies, technologies and applications for mobile environments.

show abstract

Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms

Cited by 29 publications

References 35 publications

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Deep Learning for Mobile Multimedia

Contact Info

Product

Resources

About