Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-Based Deep Learning

Song, Mingcong; Zhang, Jiaqi; Chen, Huixiang; Li, Tao

doi:10.1109/hpca.2018.00016

Cited by 51 publications

(25 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We observe that stereo DNNs make heavy use of the deconvolution operation 1 that exposes specific kernel sparsity, making conventional DNN accelerators inefficient. While prior work proposed specialized hardware to exploit deconvolution sparsity [60,76], we demonstrate that static software optimizations achieve better results without unnecessary hardware modifications.…”

Section: Introductionmentioning

confidence: 81%

“…Stereo vision DNNs make use of deconvolution layers, which expose structured sparsity patterns. Recent work has prosed specialized hardware specifically for exploiting sparsity in deconvolution layers [60,76]. Our observation, however, is that mitigating sparsityinduced efficiencies in deconvolution does not necessarily require hardware support.…”

Section: Related Workmentioning

confidence: 98%

See 1 more Smart Citation

Asv

Whatmough

Zhu

2019

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

Estimating depth from stereo vision cameras, i.e., "depth from stereo", is critical to emerging intelligent applications deployed in energy-and performance-constrained devices, such as augmented reality headsets and mobile autonomous robots. While existing stereo vision systems make trade-offs between accuracy, performance and energy-efficiency, we describe ASV, an accelerated stereo vision system that simultaneously improves both performance and energy-efficiency while achieving high accuracy.The key to ASV is to exploit unique characteristics inherent to stereo vision, and apply stereo-specific optimizations, both algorithmically and computationally. We make two contributions. Firstly, we propose a new stereo algorithm, invariant-based stereo matching (ISM), that achieves significant speedup while retaining high accuracy. The algorithm combines classic "hand-crafted" stereo algorithms with recent developments in Deep Neural Networks (DNNs), by leveraging the correspondence invariant unique to stereo vision systems. Secondly, we observe that the bottleneck of the ISM algorithm is the DNN inference, and in particular the deconvolution operations that introduce massive compute-inefficiencies. We propose a set of software optimizations that mitigate these inefficiencies. We show that with less than 0.5% hardware area overhead, these algorithmic and computational optimizations can be effectively integrated within a conventional DNN accelerator. Overall, ASV achieves 5× speedup and 85% energy saving with 0.02% accuracy loss compared to today's DNN-based stereo vision systems.

show abstract

Section: Introductionmentioning

confidence: 81%

Section: Related Workmentioning

confidence: 98%

Asv

Whatmough

Zhu

2019

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

show abstract

“…Deconvolution is an operation that is also adopted in generative adversarial networks (GAN), and there have been many studies on deconvolution accelerators for GANs. Wang et al [37], Song et al [38], GANAX [39], LerGAN [40] accelerated the 16-bit network, and GNA [41] implemented deconvolution accelerator that supports flexible bit-width of 8-bit and 16-bit.…”

Section: Related Workmentioning

confidence: 99%

Binarized Encoder-Decoder Network and Binarized Deconvolution Engine for Semantic Segmentation

Kim

Choi

et al. 2021

IEEE Access

View full text Add to dashboard Cite

“…References [22]- [24] consider the padding-zero operations when designing an accelerator, but they do not eliminate the operations. Reference [25] reshapes the input data to skip the zero-data, but it is customized for the GAN used in its work; it may need the filling-zero operations when it computes conventional CNNs with different kernel sizes.…”

Section: A Padding-zero Operationsmentioning

confidence: 99%

LACS: A High-Computational-Efficiency Accelerator for CNNs

Shang

Qian²,

Zhang

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have become continually deeper. With the increasing depth of CNNs, the invalid calculations caused by padding-zero operations, filling-zero operations and stride length (stride length>1) represent an increasing proportion of all calculations. To adapt to different CNNs and to eliminate the influences of padding-zero operations, filling-zero operations and stride length on the computational efficiency of the accelerator, we draw upon the computation pattern of CPUs to design an efficient and versatile CNN accelerator, LACS (Loading-Addressing-Computing-Storing). We reduce the amount of data movements between registers and the on-chip buffer from O(k × k) to O(k) by a bypass buffer mechanism. Finally, we deploy LACS on a field-programmable gate array (FPGA) chip and analyze the factors that affect the computational efficiency of LACS. We also run popular CNNs on LACS. The results show that LACS achieves an extremely high computational efficiency, 98.51% when executing AlexNet and 99.66% when executing VGG-16, significantly exceeding state-of-the-art accelerators. INDEX TERMS Accelerator, convolutional neural networks (CNNs), field-programmable gate array (FPGA), buffer mechanism.

show abstract

Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-Based Deep Learning

Cited by 51 publications

References 17 publications

Asv

Asv

Binarized Encoder-Decoder Network and Binarized Deconvolution Engine for Semantic Segmentation

LACS: A High-Computational-Efficiency Accelerator for CNNs

Contact Info

Product

Resources

About