Benchmarking vision kernels and neural network inference accelerators on embedded platforms

Qasaimeh, Murad; Denolf, Kristof; Khodamoradi, Alireza; Blott, Michaela; Lo, Jack; Halder, Lisa; Vissers, Kees; Zambreno, Joseph; Jones, Phillip H.

doi:10.1016/j.sysarc.2020.101896

Cited by 29 publications

(10 citation statements)

References 17 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, it is necessary to use the model and pre-trained weights together with the most appropriate software to work with the GPU. Studies were conducted on software environments such as TensorFlow, TensorFlow Light, NVIDIA TensorRT, and OpenCV DNN Module [35] . These types of software frameworks allow real-time operation of models with 8-bit integer and 16-bit floating-point optimizations.…”

Section: Methodsmentioning

confidence: 99%

Development of smart camera systems based on artificial intelligence network for social distance detection to fight against COVID-19

Karaman

Alhudhaif

Polat

2021

Applied Soft Computing

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Development of smart camera systems based on artificial intelligence network for social distance detection to fight against COVID-19

Karaman

Alhudhaif

Polat

2021

Applied Soft Computing

View full text Add to dashboard Cite

“…Table I shows the calculation results of different models, where ResNet-18, a common backbone network in computer vision, is used as a benchmark for comparison. Qasaimeh et al [28] measured the performance of ResNet-18 on embedded platforms and showed that ResNet-18 could achieve 5.17 frames/s on ARM Cortex A57 CPU and 145 frames/s on Jetson TX2 GPU. Compared to ResNet-18, the AMagPoseNet has only 24% of its NPs and 0.98% of its computation (FLOPs).…”

Section: Table I Nps and Flops For Different Modelsmentioning

confidence: 99%

AMagPoseNet: Real-Time Six-DoF Magnet Pose Estimation by Dual-Domain Few-Shot Learning From Prior Model

Yuan

et al. 2023

IEEE Trans. Ind. Inf.

View full text Add to dashboard Cite

Traditional magnetic tracking approaches based on mathematical models and optimization algorithms are computationally intensive, depend on initial guesses, and do not guarantee convergence to a global optimum. Although fully supervised data-driven deep learning can solve the above issues, the demand for a comprehensive dataset hampers its applicability in magnetic tracking. Thus, we propose an annular magnet pose estimation network (called AMagPoseNet) based on dual-domain few-shot learning from a prior mathematical model, which consists of two subnetworks: PoseNet and CaliNet. PoseNet learns to estimate the magnet pose from the prior mathematical model, and CaliNet is designed to narrow the gap between the mathematical model domain and the real-world domain. Experimental results reveal that the AMagPoseNet outperforms the optimization-based method regarding localization accuracy (1.87±1.14 mm, 1.89±0.81 • ), robustness (nondependence on initial guesses), and computational latency (2.08±0.02 ms). In addition, the six-degree-offreedom pose of the magnet could be estimated when discriminative magnetic field features are provided. With the assistance of the mathematical model, the AMagPoseNet

show abstract

“…Reference [20] investigates the on-the-edge inference of DNNs in terms of latency, energy consumption, and temperature, on five different hardware platforms; unlike the proposed method, this work does not take advantage of the optimization frameworks we have investigated. In [21], an in-depth benchmark analysis of three embedded platforms is performed for image vision applications including MobileNet and InceptionV2; in [22], EDLAB is delivered, an end-to-end benchmark to evaluate the overall performance of three image classification and one object detection models across Intel NCS2, Edge TPU and Jetson Xavier NX. In [23], a performance analysis of the edge TPU board is provided for object classification.…”

Section: Related Workmentioning

confidence: 99%

Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection

et al. 2022

View full text Add to dashboard Cite

Developing efficient on-the-edge Deep Learning (DL) applications is a challenging and non-trivial task, as first different DL models need to be explored with different trade-offs between accuracy and complexity, second, various optimization options, frameworks and libraries are available that need to be explored, third, a wide range of edge devices are available with different computation and memory constraints. As such, trade-offs arise among inference time, energy consumption, efficiency (throughput/watt) and value (throughput/dollar). To shed some light in this problem, a case study is delivered where seven Image Classification (IC) and six Object Detection (OD) State-of-The-Art (SOTA) DL models were used to detect face masks on the following commercial off-the-shelf edge devices: Raspberry PI 4, Intel Neural Compute Stick 2, Jetson Nano, Jetson Xavier NX, and i.MX 8M Plus. First, a full end-toend video pipeline face mask wearing detection architecture is developed. Then, the thirteen DL models were optimized, evaluated and compared on the edge devices, in terms of accuracy and inference time. To leverage the computational power of the edge devices, the models have been optimized, first, by using the SOTA optimization frameworks (TensorFlow Lite, OpenVINO, TensorRT, eIQ) and, second, by evaluating/comparing different optimization options, e.g., different levels of quantization. Note that the five edge devices are evaluated and compared too, in terms of inference time, value and efficiency. Last, we obtain insightful observations on which optimization frameworks, libraries and options to use and on how to select the right device depending on the target metric (inference time, efficiency and value). For example, we show that Jetson Xavier NX platform is the best in terms of latency and efficiency (FPS/Watt), while Jetson Nano is the best in terms of value (FPS/$).

show abstract

Benchmarking vision kernels and neural network inference accelerators on embedded platforms

Cited by 29 publications

References 17 publications

Development of smart camera systems based on artificial intelligence network for social distance detection to fight against COVID-19

Development of smart camera systems based on artificial intelligence network for social distance detection to fight against COVID-19

AMagPoseNet: Real-Time Six-DoF Magnet Pose Estimation by Dual-Domain Few-Shot Learning From Prior Model

Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection

Contact Info

Product

Resources

About