Fengbin Tu scite author profile

Transformer Neural Networks have demonstrated leading performance in many applications spanning over language understanding, image processing, and generative modeling. Despite the impressive performance, long-sequence Transformer processing is expensive due to quadratic computation complexity and memory consumption of self-attention. In this paper, we present DOTA, an algorithmarchitecture co-design that effectively addresses the challenges of scalable Transformer inference. Based on the insight that not all connections in an attention graph are equally important, we propose to jointly optimize a lightweight Detector with the Transformer model to accurately detect and omit weak connections during runtime. Furthermore, we design a specialized system architecture for end-to-end Transformer acceleration using the proposed attention detection mechanism. Experiments on a wide range of benchmarks demonstrate the superior performance of DOTA over other solutions. In summary, DOTA achieves 152.6× and 4.5× performance speedup and orders of magnitude energy-efficiency improvements over GPU and customized hardware, respectively. CCS CONCEPTS• Computer systems organization → Neural networks; • Computing methodologies → Machine learning approaches.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fengbin Tu

Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications

A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications

Evolver: A Deep Learning Processor With On-Device Quantization–Voltage–Frequency Tuning

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM

AEPE: An area and power efficient RRAM crossbar-based accelerator for deep CNNs

DOTA: detect and omit weak attentions for scalable transformer acceleration

Contact Info

Product

Resources

About