Efficient Deep Learning Inference Based on Model Compression

Zhang, Qing; Zhang, Mengru; Wang, Mengdi; Sui, Wanchen; Meng, Chen; Yang, Jun; Kong, Weidan; Cui, Xiaoyuan; Lin, Wei

doi:10.1109/cvprw.2018.00221

Cited by 11 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our high fps is a consequence of our linear runtime complexity and we validate our theoretical claims in Section V. We further hypothesize that prior deep learning-based methods [37], [14] are less optimal in terms of runtime due to the intensive computation requirements by deep neural networks [51], [45]. For example, ResNet [18] needs more than 25 MB for storing the computed model in memory, and more than 4 billion float point operations (FLOPs) to process a single image of size 224×224 [51].…”

Section: Discussionsupporting

confidence: 72%

RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

Chandra

Bhattacharya

Randhavane³

et al. 2020

2020 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

We present a realtime tracking algorithm, Road-Track, to track heterogeneous road-agents in dense traffic videos. Our approach is designed for traffic scenarios that consist of different road-agents such as pedestrians, two-wheelers, cars, buses, etc. sharing the road. We use the tracking-bydetection approach where we track a road-agent by matching the appearance or bounding box region in the current frame with the predicted bounding box region propagated from the previous frame. Roadtrack uses a novel motion model called the Simultaneous Collision Avoidance and Interaction (SimCAI) model to predict the motion of road-agents by modeling collision avoidance and interactions between the road-agents for the next frame. We demonstrate the advantage of RoadTrack on a dataset of dense traffic videos and observe an accuracy of 75.8% on this dataset, outperforming prior state-of-the-art tracking algorithms by at least 5.2%. RoadTrack operates in realtime at approximately 30 fps and is at least 4× faster than prior tracking algorithms on standard tracking datasets.

show abstract

Section: Discussionsupporting

confidence: 72%

RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

Chandra

Bhattacharya

Randhavane³

et al. 2020

2020 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

show abstract

“…In contrast, structured pruning is more friendly and efficient on various off-the-shelf deployment platforms, simultaneously speeding up network inference and reducing the memory overhead of CNNs. It can be further categorized into greedy-based pruning [25], [27], [30], [44], [52], [58], [60], [81], search-based pruning [17], [26], [54], [57], dynamic pruning [7], [12], [47], [63], [73], [76], and sparsity regularization-based pruning [31], [42], [45], [46], [51], [53], [56], [75], [78], [80], [84].…”

Section: Related Work a Network Pruningmentioning

confidence: 99%

“…The pruning results of ResNeXt-29 are shown in Table 5. By adding edge sparsity regularization, edge-level pruning [84] achieves an increase in error of 0.16% with 55.4% and 28.4% pruning rates in terms of the FLOPs and parameters, respectively. Compared to edge-level pruning, our OED removes 5 out of 9 residual blocks, achieving a higher parameter pruning rate of 58.5% (vs. 28.4%), with a slightly lower classification error of 4.08% (vs. 4.11%).…”

Section: B: Resnext-29mentioning

confidence: 99%

“…The proposed OED is evaluated on a variety of network architectures, including ResNets [24], MobileNet V2 [68] and ResNeXts [77]. Compared to the state-of-the-art structured pruning methods [27], [31], [44], [53], [56], [58], [76], [81], [84], the proposed OED achieves a superior performance. For example, on CIFAR-10, the pruned ResNet-56 and ResNet-110 achieve increases in the classification error of only 0.68% and 0.5%, with FLOPs savings of 41.4% and 54.1% and parameters savings of 43.5% and 48.8%, respectively.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pruning Blocks for CNN Compression and Acceleration via Online Ensemble Distillation

et al. 2019

View full text Add to dashboard Cite

In this paper, we propose an online ensemble distillation (OED) method to automatically prune blocks/layers of a target network by transferring the knowledge from a strong teacher in an endto-end manner. To accomplish this, we first introduce a soft mask to scale the output of each block in the target network and enforce the sparsity of the mask by sparsity regularization. Then, a strong teacher network is constructed online by replicating the same target networks and ensembling the discriminative features from each target as its new features. Cooperative learning between multiple target networks and the teacher network is further conducted in a closed-loop form, which improves their performance. To solve the optimization problem in an end-to-end manner, we employ the fast iterative shrinkage-thresholding algorithm to fast and reliably remove the redundant blocks, in which the corresponding soft masks are equal to zero. Compared to other structured pruning methods with iterative fine-tuning, the proposed OED is trained more efficiently in one training cycle. Extensive experiments demonstrate the effectiveness of OED, which can not only simultaneously compress and accelerate a variety of CNN architectures but also enhance the robustness of the pruned networks.INDEX TERMS Fast iterative shrinkage-thresholding algorithm, model compression and acceleration, network pruning, online ensemble distillation.

show abstract

“…As edge inferencing [43] is as important as model training, research has also fond to reduce the latency of inference time locally rather than connecting the edge devices towards the cloud server for inferencing. To enable model to run efficiently on the edge device such as an embedded device, model compression [44] is used to reduce model size and complexity. Vanhouckeet.…”

Section: Implementation On Hardware Accelerationmentioning

confidence: 99%

Classification of Compressed Domain Images Utilizing Open VINO Inference Engine

Zhen,

Borhanuddin,

Wan

et al. 2019

IJEAT

View full text Add to dashboard Cite

This paper provides a platform to investigate and explore method of ‘partial decoding of JPEG images’ for image classification using Convolutional Neural Network (CNN). The inference is targeting to run on computer system with x86 CPU architecture. We aimed to improve the inference speed of classification by just using part of the compressed domain image information for prediction. We will extract and use the ‘Discrete Cosine Transform’ (DCT) coefficients from compressed domain images to train our models. The trained models are then converted into OpenVINO Intermediate Representation (IR) format for optimization. During inference stage, full decoding is not required as our model only need DCT coefficients which are presented in the process of image partial decoding. Our customized DCT model are able to achieve up to 90% validation and testing accuracy with great competence towards the conventional RGB model. We can also obtain up to 2x times inference speed boost while performing inference on CPU in compressed domain compared with spatial domain employing OpenVINO inference engine.

show abstract

Efficient Deep Learning Inference Based on Model Compression

Cited by 11 publications

References 14 publications

RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

Pruning Blocks for CNN Compression and Acceleration via Online Ensemble Distillation

Classification of Compressed Domain Images Utilizing Open VINO Inference Engine

Contact Info

Product

Resources

About