2018
DOI: 10.1109/tc.2017.2777863
|View full text |Cite
|
Sign up to set email alerts
|

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

Abstract: Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient tr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
36
0
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 73 publications
(37 citation statements)
references
References 42 publications
0
36
0
1
Order By: Relevance
“…Due to its heterogeneity, CPU-GPU based systems exhibit several interesting traffic characteristics, for instance, GPUs typically only communicate with a few shared last level caches (LLCs) which results in many-to-few traffic patterns (i.e., many GPUs communicate with a few LLCs) with negligible inter-GPU communication [7], [13], [14]. This can cause the LLCs to become bandwidth bottlenecks under heavy network loads and lead to significant performance degradation [7]. In addition, since heterogeneous systems share the memory resources, the GPUs can monopolize the memory and cause high CPU memory access latency [15].…”
Section: D Heterogeneous Nocsmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to its heterogeneity, CPU-GPU based systems exhibit several interesting traffic characteristics, for instance, GPUs typically only communicate with a few shared last level caches (LLCs) which results in many-to-few traffic patterns (i.e., many GPUs communicate with a few LLCs) with negligible inter-GPU communication [7], [13], [14]. This can cause the LLCs to become bandwidth bottlenecks under heavy network loads and lead to significant performance degradation [7]. In addition, since heterogeneous systems share the memory resources, the GPUs can monopolize the memory and cause high CPU memory access latency [15].…”
Section: D Heterogeneous Nocsmentioning
confidence: 99%
“…Due to the heterogeneity of the cores integrated on a single chip, the communication requirements for each core can vary significantly. For example, in a CPU-GPU based heterogeneous system, CPUs require low memory latency while GPUs need high-throughput data transfers [7]. In addition to the individual core requirements, 3D ICs allow dense circuit integration but have much higher power density than their 2D counterparts.…”
Section: Introductionmentioning
confidence: 99%
“…This does not invalidate the potential of the WNoC paradigm, but leads to erroneous assumptions on the achievable speed and power. For instance, many WNoC architectures assume rates over 10 Gb/s [12], [27], [28], which may not be achievable due to multipath effects. Other works obtain power consumption estimates by assuming path losses between 25 and 30 dB [36]- [39], values that are far from the true in standard chip packages.…”
Section: Introductionmentioning
confidence: 99%
“…Figures 12,13,14 show the achieved network throughput for all three traffic patterns. Similar to latency, the network throughput is also consistent with the packet injection rate.…”
mentioning
confidence: 99%