Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Cano, José; Radu, Valentin; Crowley, Elliot J.; OrBoyle, Michael; Storkey, Amos

doi:10.1109/iiswc.2018.8573503

Cited by 29 publications

(22 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous work shows that combining compression methods can achieve superior performance compared with using them in isolation, e.g., combining pruning and knowledge distillation [46]. The approach in [47] shows that distilling knowledge to shallower quantised architectures can achieve accuracy comparable with state-of-the-art full-precision models.…”

Section: Combining Knowledge Distillation With Quantisationmentioning

confidence: 99%

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

et al. 2021

View full text Add to dashboard Cite

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62× speedup.

show abstract

Section: Combining Knowledge Distillation With Quantisationmentioning

confidence: 99%

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

et al. 2021

View full text Add to dashboard Cite

show abstract

“…In this paper, we propose grouped spatial pack convolutions (GSPC), for the common NCHW data layout 1 . We modify and extend the spatial pack convolutions (SPC) algorithm described in [11], which does not cover grouped convolutions.…”

Section: A Motivationmentioning

confidence: 99%

“…Figure 3 illustrates the GSPC algorithm with an example. We use tile sizes T O = T I = 2, as these are the maximum values allowed by the constraints (1). The initial data layout is shown on the left, with the channels split by group for clarity.…”

Section: B General Descriptionmentioning

confidence: 99%

“…That is, many neural architecture compression techniques may not work as expected at the system level where one of the main metrics considered is the inference time. Turner et al [1] demonstrated that compression at the neural architecture level may have negative effects further down the Deep Learning Inference Stack, depending on the choices of algorithmic transformation and the target hardware device.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimizing Grouped Convolutions on Edge Devices

Gibson

Cano

Crowley

et al. 2020

2020 IEEE 31st International Conference on Application-Specific Systems, Architectures and Processors (ASAP)

Self Cite

View full text Add to dashboard Cite

When deploying a deep neural network on constrained hardware, it is possible to replace the network's standard convolutions with grouped convolutions. This allows for substantial memory savings with minimal loss of accuracy. However, current implementations of grouped convolutions in modern deep learning frameworks are far from performing optimally in terms of speed. In this paper we propose Grouped Spatial Pack Convolutions (GSPC), a new implementation of grouped convolutions that outperforms existing solutions. We implement GSPC in TVM, which provides state-of-the-art performance on edge devices. We analyze a set of networks utilizing different types of grouped convolutions and evaluate their performance in terms of inference time on several edge devices. We observe that our new implementation scales well with the number of groups and provides the best inference times in all settings, improving the existing implementations of grouped convolutions in TVM, PyTorch and TensorFlow Lite by 3.4⇥, 8⇥ and 4⇥ on average respectively. Code is available at https://github.com/gecLAB/tvm-GSPC/

show abstract

“…DNN benchmarking. Turner et al [57] implemented several common DNN compression techniques (weight pruning, channel pruning, and quantization) and evaluated the accuracy, execution time, and memory space on both CPU and GPU. They found that channel pruning can greatly reduce the execution time while weight pruning cannot.…”

Section: Related Workmentioning

confidence: 99%

DNNTune

Xia

Zhao

Cui

et al. 2019

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) are now increasingly adopted in a variety of Artificial Intelligence (AI) applications. Meantime, more and more DNNs are moving from cloud to the mobile devices, as emerging AI chips are integrated into mobiles. Therefore, the DNN models can be deployed in the cloud, on the mobile devices, or even mobile-cloud coordinate processing, making it a big challenge to select an optimal deployment strategy under specific objectives. This article proposes a DNN tuning framework, i.e., DNNTune, that can provide layer-wise behavior analysis across a number of platforms. Using DNNTune, this article further selects 13 representative DNN models, including CNN, LSTM, and MLP, and three mobile devices ranging from low-end to high-end, and two AI accelerator chips to characterize the DNN models on these devices to further assist users finding opportunities for mobile-cloud coordinate computing. Our experimental results demonstrate that DNNTune can find a coordinated deployment achieving up to 1.66× speedup and 15% energy saving comparing with mobile-only and cloud-only deployment.

show abstract

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Cited by 29 publications

References 21 publications

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Optimizing Grouped Convolutions on Edge Devices

DNNTune

Contact Info

Product

Resources

About