Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices

Shafi, Omais; Rai, Chinmay; Sen, Rijurekha; Ananthanarayanan, Gayathri

doi:10.1109/iiswc53511.2021.00030

Cited by 32 publications

(12 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed architecture was deployed, in real‐time, on the Nvidia tensor‐RT framework to check its potential applications for a real‐time EEG‐BCI. It is a C++ library that helps to boost the inference time of the NVIDIA GPUs (Shafi et al, 2021). Figure 6 shows a flow chart of the architecture inferencing in the Tensor RT framework.…”

Section: Resultsmentioning

confidence: 99%

One‐dimensional atrous conv‐net based architecture for automatic diagnosis of epilepsy using electroencephalography signals and its brain–computer interface applications

Handa,

Gupta,

Gupta

et al. 2023

Expert Systems

View full text Add to dashboard Cite

Precise monitoring and diagnosis of epilepsy by manual analysis of EEG signals are challenging due to the low doctor‐to‐patient ratio, and shortage of medical resources. To automate this diagnosis in real‐time, EEG based Brain–Computer Interface (BCI) system with integration of artificial intelligence techniques will prove to be propitious. This work proposes an end‐to‐end, one‐dimensional atrous conv‐net‐based architecture for automatic epilepsy diagnosis using EEG signals with a conceptual framework of the EEG‐BCI system for routine monitoring and clinical use. The proposed architecture has a robust backbone of six blocks of atrous convolutional layers activated with exponential linear unit functions. The six blocks are followed by the addition of a long short‐term memory layer for automatic feature extraction and sequential EEG data analysis. The efficacy of the proposed architecture has been verified on three publicly available EEG datasets using various evaluation metrics, feature maps, test set evaluation, and ablation studies. An average training and validation accuracy of 96.16% and 90.80% has been achieved upon multiple runs for the three datasets. Ablation experiments indicate that the addition of each block contributed to increasing 17%–25% accuracy scores during the classification of epileptic and non‐epileptic EEG signals. The real‐time EEG‐BCI has been analyzed using weight optimization of the proposed architecture through the NVIDIA Tensor RT framework on a 40 GB DGX A100 NVIDIA workstation. The proposed architecture has generalized well in comparison with the existing techniques for the three EEG datasets and achieved a low training and validation loss with optimum evaluation metrics. This makes the proposed architecture suitable for future EEG‐BCI system deployment in the automatic diagnosis of epilepsy.

show abstract

Section: Resultsmentioning

confidence: 99%

One‐dimensional atrous conv‐net based architecture for automatic diagnosis of epilepsy using electroencephalography signals and its brain–computer interface applications

Handa,

Gupta,

Gupta

et al. 2023

Expert Systems

View full text Add to dashboard Cite

show abstract

“…However, not all objects need to be detected, so we focused on the 17 most common categories. The process involves inputting color images to YOLOV5, accelerated with TensorRT [29], accelerating to obtain semantic labels of the COCO categories. Bounding boxes help approximate the region of dynamic objects in the images, and feature points within these boxes are assigned semantic labels.…”

Section: Semantic Label Incremental Updating With Bayes' Rulementioning

confidence: 99%

D-VINS: Dynamic Adaptive Visual–Inertial SLAM with IMU Prior and Semantic Constraints in Dynamic Scenes

et al. 2023

View full text Add to dashboard Cite

Visual–inertial SLAM algorithms empower robots to autonomously explore and navigate unknown scenes. However, most existing SLAM systems heavily rely on the assumption of static environments, making them ineffective when confronted with dynamic objects in the real world. To enhance the robustness and localization accuracy of SLAM systems in dynamic scenes, this paper introduces a visual–inertial SLAM framework that integrates semantic and geometric information, called D-VINS. This paper begins by presenting a method for dynamic object classification based on the current motion state of features, enabling the identification of temporary static features within the environment. Subsequently, a feature dynamic check module is devised, which utilizes inertial measurement unit (IMU) prior information and geometric constraints from adjacent frames to calculate dynamic factors. This module also validates the classification outcomes of the temporary static features. Finally, a dynamic adaptive bundle adjustment module is developed, utilizing the dynamic factors of the features to adjust their weights during the nonlinear optimization process. The proposed methodology is evaluated using both public datasets and a dataset created specifically for this study. The experimental results demonstrate that D-VINS stands as one of the most real-time, accurate, and robust systems for dynamic scenes, showcasing its effectiveness in challenging real-world scenes.

show abstract

“…This requires the design of accelerators to speed up neural network inference in edge scenarios. The mainstream choice for accelerating neural network inference in edge scenarios is through the design of specific hardware accelerators such as NVDLA and other NPUs [29]. However, hardware accelerators with general architectures implemented through ASICs not only have a high design difficulty and a long development cycle but also may not be sufficient to meet the real-time requirements in terms of acceleration ratio.…”

Section: Introductionmentioning

confidence: 99%

HISP: Heterogeneous Image Signal Processor Pipeline Combining Traditional and Deep Learning Algorithms Implemented on FPGA

Chen,

Wang,

et al. 2023

Electronics

View full text Add to dashboard Cite

To tackle the challenges of edge image processing scenarios, we have developed a novel heterogeneous image signal processor (HISP) pipeline combining the advantages of traditional image signal processors and deep learning ISP (DLISP). Through a multi-dimensional image quality assessment (IQA) system integrating deep learning and traditional methods like RankIQA, BRISQUE, and SSIM, various partitioning schemes were compared to explore the highest-quality imaging heterogeneous processing scheme. The UNet-specific deep-learning processing unit (DPU) based on a field programmable gate array (FPGA) provided a 14.67× acceleration ratio for the total network and for deconvolution and max pool, the calculation latency was as low as 2.46 ms and 97.10 ms, achieving an impressive speedup ratio of 46.30× and 36.49× with only 4.04 W power consumption. The HISP consisting of a DPU and the FPGA-implemented traditional image signal processor (ISP) submodules, which scored highly in the image quality assessment system, with a single processing time of 524.93 ms and power consumption of only 8.56 W, provided a low-cost and fully replicable solution for edge image processing in extremely low illumination and high noise environments.

show abstract

Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices

Cited by 32 publications

References 31 publications

One‐dimensional atrous conv‐net based architecture for automatic diagnosis of epilepsy using electroencephalography signals and its brain–computer interface applications

One‐dimensional atrous conv‐net based architecture for automatic diagnosis of epilepsy using electroencephalography signals and its brain–computer interface applications

D-VINS: Dynamic Adaptive Visual–Inertial SLAM with IMU Prior and Semantic Constraints in Dynamic Scenes

HISP: Heterogeneous Image Signal Processor Pipeline Combining Traditional and Deep Learning Algorithms Implemented on FPGA

Contact Info

Product

Resources

About