Real-Time Object Detection System with Multi-Path Neural Networks

Heo, Seonyeong; Cho, Sungjun; Kim, Youngsok; Kim, Hanjun

doi:10.1109/rtas48715.2020.000-8

Cited by 41 publications

(30 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wang et al [60] navigate the performance-power tradeoff space of mobile SoCs equipped with heterogeneous processors when they perform ML inferences. Heo et al [28] propose an ML inference latency prediction model for GPU and devises multipath neural networks, which enable the runtime to choose which path to take to meet real-time latency constraints. AutoScale [36] is an execution scaling engine that leverages Reinforcement Learning to adaptively determine which platform to pick for performing inference to improve energy efficiency in edge-cloud systems.…”

Section: Inference Task Scheduling For Heterogeneous-platform Edge Devicementioning

confidence: 99%

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

Seo

Cha

Kim

et al. 2021

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers. Such edge platforms are commonly equipped with heterogeneous computing processors such as GPU, DSP, and other accelerators, but their computational and energy budget are severely constrained compared to the data center servers. However, as an edge platform must perform the processing of multiple machine learning models concurrently for multimodal sensor data, its scheduling problem poses a new challenge to map heterogeneous machine learning computation to heterogeneous computing processors. Furthermore, processing of each input must provide a certain level of bounded response latency, making the scheduling decision critical for the edge platform. This article proposes a set of new heterogeneity-aware ML inference scheduling policies for edge platforms. Based on the regularity of computation in common ML tasks, the scheduler uses the pre-profiled behavior of each ML model and routes requests to the most appropriate processors. It also aims to satisfy the service-level objective (SLO) requirement while reducing the energy consumption for each request. For such SLO supports, the challenge of ML computation on GPUs and DSP is its inflexible preemption capability. To avoid the delay caused by a long task, the proposed scheduler decomposes a large ML task to sub-tasks by its layer in the DNN model.

show abstract

Section: Inference Task Scheduling For Heterogeneous-platform Edge Devicementioning

confidence: 99%

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

Seo

Cha

Kim

et al. 2021

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…Estimating the worst-case execution time is also discussed in some other works [24], [25]. Besides, there are some works [27], [28] concerning query time estimation in the database context.…”

Section: Related Workmentioning

confidence: 99%

“…Some of these [26], [27], [29] developed (pure) analytical models and assess the validity of the model. Many of the existing studies [20]- [24], [28] build neural-net based learning models and utilize them for deriving estimated time; many others [2], [10], [16]- [19], [25], [30] use tree and linear regression based machine learning models. Some other works [11], [12], [14], [15] use hybrid methods combining these tools-analytical model, machine learning, and deep learning.…”

Section: Related Workmentioning

confidence: 99%

“…Fourth, the availability of hardware-specific parameter data differs from the existing works. The algorithms proposed in some of the studies [2], [12]- [14], [16], [17], [24], [29] require hardware information that may enhance the quality of the runtime estimation, while those proposed in some other studies [15], [18] require no such information.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Suh

Kim

2020

IEEE Access

View full text Add to dashboard Cite

“…To address the issue, recent works, e.g., [24,25], have investigated how to dynamically skip or add layers to meet timing constraints. Unlike the non-adaptive baseline, we support methodical trade-offs between the inference time and accuracy.…”

Section: Introductionmentioning

confidence: 99%

Adaptive Deep Learning for Soft Real-Time Image Classification

Chai

Kang

2021

Technologies

View full text Add to dashboard Cite

CNNs (Convolutional Neural Networks) are becoming increasingly important for real-time applications, such as image classification in traffic control, visual surveillance, and smart manufacturing. It is challenging, however, to meet timing constraints of image processing tasks using CNNs due to their complexity. Performing dynamic trade-offs between the inference accuracy and time for image data analysis in CNNs is challenging too, since we observe that more complex CNNs that take longer to run even lead to lower accuracy in many cases by evaluating hundreds of CNN models in terms of time and accuracy using two popular data sets, MNIST and CIFAR-10. To address these challenges, we propose a new approach that (1) generates CNN models and analyzes their average inference time and accuracy for image classification, (2) stores a small subset of the CNNs with monotonic time and accuracy relationships offline, and (3) efficiently selects an effective CNN expected to support the highest possible accuracy among the stored CNNs subject to the remaining time to the deadline at run time. In our extensive evaluation, we verify that the CNNs derived by our approach are more flexible and cost-efficient than two baseline approaches. We verify that our approach can effectively build a compact set of CNNs and efficiently support systematic time vs. accuracy trade-offs, if necessary, to meet the user-specified timing and accuracy requirements. Moreover, the overhead of our approach is little/acceptable in terms of latency and memory consumption.

show abstract

Real-Time Object Detection System with Multi-Path Neural Networks

Cited by 41 publications

References 35 publications

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Adaptive Deep Learning for Soft Real-Time Image Classification

Contact Info

Product

Resources

About