AttentionLite: Towards Efficient Self-Attention Models for Vision

Kundu, Souvik; Sundaresan, Sairam

doi:10.1109/icassp39728.2021.9415117

Cited by 19 publications

(5 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, we initialize a PR model with the weights and mask of best PR model of stage 2 and allow only the parameters to train. We train the PR model with distillation via KL-divergence loss (Hinton et al, 2015;Kundu & Sundaresan, 2021) from a pre-trained AR along with a CE-loss. Moreover, we introduce an AR-PR post-ReLU activation mismatch (PRAM) penalty into the loss function.…”

Section: Maximizing Activation Similarity Via Distillationmentioning

confidence: 99%

Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference

Kundu¹,

Shunlin²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

The large number of ReLU non-linearity operations in existing deep neural networks makes them ill-suited for latency-efficient private inference (PI). Existing techniques to reduce ReLU operations often involve manual effort and sacrifice significant accuracy. In this paper, we first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts in identifying the same. Based on this sensitivity, we then present SENet, a three-stage training method that for a given ReLU budget, automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs to potentially yield latency and communication efficient PI. Experimental evaluations with multiple models on various datasets show SENet's superior performance both in terms of reduced ReLUs and improved classification accuracy compared to existing alternatives. In particular, SENet can yield models that require up to ∼2× fewer Re-LUs while yielding similar accuracy. For a similar ReLU budget SENet can yield models with ∼2.32% improved classification accuracy, evaluated on CIFAR-100.

show abstract

Section: Maximizing Activation Similarity Via Distillationmentioning

confidence: 99%

Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference

Kundu¹,

Shunlin²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Various attention mechanisms have been successfully used in computer vision, especially in the field of semantic segmentation. Normally, there are two attention mechanisms: soft-attention mechanism [19][20][21][22] and self-attention mechanism [23][24][25][26]. In the soft-attention mechanism, channel attention and spatial attention are often used for dealing with the task of semantic segmentation.…”

Section: Attention Mechanismmentioning

confidence: 99%

“…As a result, the computational load of the model is greatly reduced, and the efficiency of the model is improved without losing too much accuracy. Recently, various attention mechanisms [19][20][21][22][23][24][25][26] have been successfully applied in many computer vision tasks. Such as SENet [19] and CBAM [20], these papers prove that weighting in space and channel is helpful to improve feature extraction.…”

Section: Introductionmentioning

confidence: 99%

LRDNet: A lightweight and efficient network with refined dual attention decorder for real-time semantic segmentation

Zhuang

Zhong

et al. 2021

Neurocomputing

View full text Add to dashboard Cite

“…Similarly, when dealing with information, the attention mechanism only focusses attention on the part of the regional information that is conducive to the realisation of the task, which not only describes the focus of the model but also improves the representation of features. To solve the problem of convolution position insensitivity, AttentionLite [41] uses a self‐attention mechanism instead of convolution, generating weights from trainable queries, keys, and values. It requires only a small number of parameters and can be better than similar models in accuracy.…”

Section: Related Workmentioning

confidence: 99%

TRC‐YOLO: A real‐time detection method for lightweight targets based on mobile devices

Wang

Ding

Yang

et al. 2021

IET Computer Vision

View full text Add to dashboard Cite

Object detection is one of the main tasks of computer vision. Object detection algorithms usually rely on deep convolutional neural networks, which require the host device to have high computing capabilities, greatly limiting the application of object detection methods for mobile devices with limited computing capabilities, such as embedded devices. Among the current object detection algorithms, the you only look once (YOLO) series takes both speed and accuracy into consideration and is one of the most commonly used methods for object detection. In this article, TRC-YOLO is proposed, which improves the mean average precision (mAP) and real-time detection speed of the model while reducing the size of the model. In TRC-YOLO, the convolution kernel of YOLO v4-tiny is pruned and an expansive convolution layer is introduced into the residual module of the network to produce an hourglass Cross Stage Partial ResNet (CSPResNet) structure. A receptive field block (RFB) that simulates human vision is also added, increasing the receptive field of the model and strengthening the feature extraction ability of the network. In addition, the convolutional block attention module is applied, which combines spatial attention and channel attention, to enhance the effective features of the model and reduce the negative impact of noise on the model. The size of the TRC-YOLO model is 17.8 MB, which is 5.9 MB smaller than YOLO v4-tiny, and the model parameter is 2.983 billion floating point operations per second (BFLOP/s) (3.834 BFLOP/s less than YOLO v4-tiny). In addition, TRC-YOLO achieves a real-time performance of 36.9 frames per second on a Jetson Xavier NX, and its mAP on the PASCAL VOC dataset is 66.4% (3.83% higher than YOLO v4-tiny). In addition, the mAP of TRC-YOLO on the MS COCO dataset is 37.7%, which is 1.9% higher than that of the baseline model. K E Y W O R D SCBAM, dilated convolution, object detection, receptive field block (RFB), TridentNet, YOLO | INTRODUCTIONObject detection is a challenging task in the field of computer vision. Traditional object detection algorithms, such as histograms of oriented gradient (HOG) [1] and the deformable part-based model (DPM) [2], are mainly based on region selection using sliding windows but have high time complexity and cannot meet real-time requirements. In recent years, with the development of deep neural networks and the improvement of hardware computing power [3,4], a series of major breakthroughs with excellent performance have been made in the field of object detection.Compared with two-stage object detection algorithms, one-stage object detection algorithms, such as the single shot multibox detector (SSD) [5] and you only look once (YOLO) series [6][7][8][9][10], achieve a balance between speed and accuracy and have been widely used in practice. The YOLO series includes YOLO v1, YOLO v2, YOLO v3, and YOLO v4, the This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any...

show abstract

AttentionLite: Towards Efficient Self-Attention Models for Vision

Cited by 19 publications

References 8 publications

Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference

Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference

LRDNet: A lightweight and efficient network with refined dual attention decorder for real-time semantic segmentation

TRC‐YOLO: A real‐time detection method for lightweight targets based on mobile devices

Contact Info

Product

Resources

About