Abstract:Deep Neural Networks (DNN) have achieved state-of-the-art results in a wide range of tasks, with the best results obtained with large training sets and large models. In the past, GPUs enabled these breakthroughs because of their greater computational speed. In the future, faster computation at both training and test time is likely to be crucial for further progress and for consumer applications on low-power devices. As a result, there is much interest in research and development of dedicated hardware for Deep … Show more
“…Model compression has been extensively studied especially for image-classification tasks, see e.g., [38], [39], [40], [41], [42]. The typical model compression techniques include weight quantization/sparsification [38], [41], [42], network pruning [43], [44], KD [40], [45], and lightweight neural network architecture/operation design [46], [47], [48]. In this work, we mainly focus on model compression for vanilla GANs, i.e., noise-to-image task.…”
Generative Adversarial Networks (GANs) with high computation costs, e.g., BigGAN and StyleGAN2, have achieved remarkable results in synthesizing high resolution and diverse images with high fidelity from random noises. Reducing the computation cost of GANs while keeping generating photo-realistic images is an urgent and challenging field for their broad applications on computational resource-limited devices. In this work, we propose a novel yet simple Discriminator Guided Learning approach for compressing vanilla GAN, dubbed DGL-GAN. Motivated by the phenomenon that the teacher discriminator may contain some meaningful information, we transfer the knowledge merely from the teacher discriminator via the adversarial function. We show DGL-GAN is valid since empirically, learning from the teacher discriminator could facilitate the performance of student GANs, verified by extensive experimental findings. Furthermore, we propose a two-stage training strategy for training DGL-GAN, which can largely stabilize its training process and achieve superior performance when we apply DGL-GAN to compress the two most representative large-scale vanilla GANs, i.e., StyleGAN2 and BigGAN. Experiments show that DGL-GAN achieves state-of-the-art (SOTA) results on both StyleGAN2 (FID 2.92 on FFHQ with nearly 1/3 parameters of StyleGAN2) and BigGAN (IS 93.29 and FID 9.92 on ImageNet with nearly 1/4 parameters of BigGAN) and also outperforms several existing vanilla GAN compression techniques. Moreover, DGL-GAN is also effective in boosting the performance of original uncompressed GANs, original uncompressed StyleGAN2 boosted with DGL-GAN achieves FID 2.65 on FFHQ, which achieves a new state-of-the-art performance. Code and models are available at https://github.com/yuesongtian/DGL-GAN.
“…Model compression has been extensively studied especially for image-classification tasks, see e.g., [38], [39], [40], [41], [42]. The typical model compression techniques include weight quantization/sparsification [38], [41], [42], network pruning [43], [44], KD [40], [45], and lightweight neural network architecture/operation design [46], [47], [48]. In this work, we mainly focus on model compression for vanilla GANs, i.e., noise-to-image task.…”
Generative Adversarial Networks (GANs) with high computation costs, e.g., BigGAN and StyleGAN2, have achieved remarkable results in synthesizing high resolution and diverse images with high fidelity from random noises. Reducing the computation cost of GANs while keeping generating photo-realistic images is an urgent and challenging field for their broad applications on computational resource-limited devices. In this work, we propose a novel yet simple Discriminator Guided Learning approach for compressing vanilla GAN, dubbed DGL-GAN. Motivated by the phenomenon that the teacher discriminator may contain some meaningful information, we transfer the knowledge merely from the teacher discriminator via the adversarial function. We show DGL-GAN is valid since empirically, learning from the teacher discriminator could facilitate the performance of student GANs, verified by extensive experimental findings. Furthermore, we propose a two-stage training strategy for training DGL-GAN, which can largely stabilize its training process and achieve superior performance when we apply DGL-GAN to compress the two most representative large-scale vanilla GANs, i.e., StyleGAN2 and BigGAN. Experiments show that DGL-GAN achieves state-of-the-art (SOTA) results on both StyleGAN2 (FID 2.92 on FFHQ with nearly 1/3 parameters of StyleGAN2) and BigGAN (IS 93.29 and FID 9.92 on ImageNet with nearly 1/4 parameters of BigGAN) and also outperforms several existing vanilla GAN compression techniques. Moreover, DGL-GAN is also effective in boosting the performance of original uncompressed GANs, original uncompressed StyleGAN2 boosted with DGL-GAN achieves FID 2.65 on FFHQ, which achieves a new state-of-the-art performance. Code and models are available at https://github.com/yuesongtian/DGL-GAN.
“…The concept of Binary Neural Network (BNN) originated from the binary weight neural network (BWNN) [18], and the BWNN only quantizes the bit representation of the weight value into the binary value. However, for the FPGA devices with small on-chip memory, the intermediate activations of the BWNN are still too large to be stored in the on-chip SRAM, and external memory is required.…”
Section: B Binary Complex Neural Networkmentioning
confidence: 99%
“…4) Binarization: There are two types of widely used binarization [18]: deterministic binarization and stochastic binarization. The equation for deterministic binarization is given in Eq.…”
Section: B Building Blocks and Operationsmentioning
confidence: 99%
“…7) is non-differentiable at 0, so the direct back-propagation is not feasible for the weight quantizaiton training. Straight-Through-Estimator (STE) is proposed in previous literatures [18], [19] for the back-propagation. The complex version of STE is proposed in [22], and the equation can be found in Eq.…”
Being able to learn from complex data with phase information is imperative for many signal processing applications. Today's real-valued deep neural networks (DNNs) have shown efficiency in latent information analysis but fall short when applied to the complex domain. Deep complex networks (DCN) , in contrast, can learn from complex data, but have high computational costs; therefore, they cannot satisfy the instant decisionmaking requirements of many deployable systems dealing with short observations or short signal bursts. Recent, Binarized Complex Neural Network (BCNN), which integrates DCNs with binarized neural networks (BNN), shows great potential in classifying complex data in real-time. In this paper, we propose a structural pruning based accelerator of BCNN, which is able to provide more than 5000 frames/s inference throughput on edge devices. The high performance comes from both the algorithm and hardware sides. On the algorithm side, we conduct structural pruning to the original BCNN models and obtain 20 × pruning rates with negligible accuracy loss; on the hardware side, we propose a novel 2D convolution operation accelerator for the binary complex neural network. Experimental results show that the proposed design works with over 90% utilization and is able to achieve the inference throughput of 5882 frames/s and 4938 frames/s for complex NIN-Net and ResNet-18 using CIFAR-10 dataset and Alveo U280 Board.
“…Quantization-aware training, which directly trains the network with lower precisions [6]. These approaches progressively enabled DNNs to first be quantized to 16-bit fixed point [7], 8-bit fixed point [8], and all the way down to binary precision [9]. The best precision of DNN parameters, however, varies across different NN models, and even across different layers within one model [5], [10].…”
Reduced-precision and variable-precision multiplyaccumulate (MAC) operations provide opportunities to significantly improve energy efficiency and throughput of DNN accelerators with no/limited algorithmic performance loss, paving a way towards deploying AI applications on resource-constraint edge devices. Accordingly, various precision-scalable MAC array (PSMA) architectures were recently proposed. However, it is difficult to make a fair comparison between those alternatives, as each proposed PSMA is demonstrated in different systems with different technologies. This work aims to provide a clear view on the design space of PSMA and offer insights for selecting the optimal architectures based on designers' needs. First, we introduce a precision-enhanced for-loop representation for DNN dataflows. Next, we use this new representation towards a comprehensive PSMA taxonomy, capable to systematically cover most prominent state-of-the-art PSMAs, as well as uncover new PSMA architectures. Following that, we build a highly parameterized PSMA template that can be design-time configured into a huge subset of the design space spanned by the taxonomy. This allows to fairly and thoroughly benchmark 72 different PSMA architectures. We perform such studies in 28nm technology targeting run-time precision scalability from 8 to 2 bits, operating at 200 MHz and 1 GHz. Analyzing resulting energy efficiency and area breakdowns reveals key design guidelines for PSMA architectures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.