Liu Liu scite author profile

Batch Normalization (BN) has been proven to be quite effective at accelerating and improving the training of deep neural networks (DNNs). However, BN brings additional computation, consumes more memory and generally slows down the training process by a large margin, which aggravates the training effort. Furthermore, the nonlinear square and root operations in BN also impede the low bit-width quantization techniques, which draws much attention in deep learning hardware community. In this work, we propose an L1-norm BN (L1BN) with only linear operations in both the forward and the backward propagations during training. L1BN is shown to be approximately equivalent to the original L2-norm BN (L2BN) by multiplying a scaling factor which equals to π 2 . Experiments on various convolutional neural networks (CNNs) and generative adversarial networks (GANs) reveal that L1BN maintains almost the same accuracies and convergence rates compared to L2BN but with higher computational efficiency. On FPGA platform, the proposed signum and absolute operations in L1BN can achieve 1.5× speedup and save 50% power consumption, compared with the original costly square and root operations, respectively. This hardware-friendly normalization method not only surpasses L2BN in speed, but also simplify the hardware design of ASIC accelerators with higher energy efficiency. Last but not the least, L1BN promises a fully quantized training of DNNs, which is crucial to future adaptive terminal devices.Index Terms-L1-norm, batch normalization (BN), deep neural network (DNN), discrete online learning S. Wu and G. Li contribute equally to this work.

show abstract

DOTA: detect and omit weak attentions for scalable transformer acceleration

Liu

et al. 2022

View full text Add to dashboard Cite

Transformer Neural Networks have demonstrated leading performance in many applications spanning over language understanding, image processing, and generative modeling. Despite the impressive performance, long-sequence Transformer processing is expensive due to quadratic computation complexity and memory consumption of self-attention. In this paper, we present DOTA, an algorithmarchitecture co-design that effectively addresses the challenges of scalable Transformer inference. Based on the insight that not all connections in an attention graph are equally important, we propose to jointly optimize a lightweight Detector with the Transformer model to accurately detect and omit weak connections during runtime. Furthermore, we design a specialized system architecture for end-to-end Transformer acceleration using the proposed attention detection mechanism. Experiments on a wide range of benchmarks demonstrate the superior performance of DOTA over other solutions. In summary, DOTA achieves 152.6× and 4.5× performance speedup and orders of magnitude energy-efficiency improvements over GPU and customized hardware, respectively. CCS CONCEPTS• Computer systems organization → Neural networks; • Computing methodologies → Machine learning approaches.

show abstract

Leveraging 3D Technologies for Hardware Security

Stow

et al. 2016

View full text Add to dashboard Cite

DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture

Liu

Deng

et al. 2020

View full text Add to dashboard Cite

SemiMap: A Semi-Folded Convolution Mapping for Speed-Overhead Balance on Crossbars

Deng

Xie

Liang

et al. 2020

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Efficient tensor core-based GPU kernels for structured sparsity under reduced precision

Chen

Liu

et al. 2021

View full text Add to dashboard Cite

Dynamic Sparse Attention for Scalable Transformer Acceleration

Liu

Chen

et al. 2022

IEEE Trans. Comput.

View full text Add to dashboard Cite

Building energy-efficient multi-level cell STT-RAM caches with data compression

Liu

Chi

et al. 2017

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Liu Liu

$L1$ -Norm Batch Normalization for Efficient Training of Deep Neural Networks

DOTA: detect and omit weak attentions for scalable transformer acceleration

Leveraging 3D Technologies for Hardware Security

DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture

SemiMap: A Semi-Folded Convolution Mapping for Speed-Overhead Balance on Crossbars

Efficient tensor core-based GPU kernels for structured sparsity under reduced precision

Dynamic Sparse Attention for Scalable Transformer Acceleration

Building energy-efficient multi-level cell STT-RAM caches with data compression

Contact Info

Product

Resources

About