“…For example, the performance measure can be accuracy for classification task and bounding box IoU for the detection task. Following previous efforts for optimizing the inference efficiency (Li et al 2020;Yuan et al 2020a), the performance measure the consistency between obtained results and exact inference outputs, instead of ground-truth labels. The multi-model inference under cost budget problem is formalized as:…”
Section: Problem Statementmentioning
confidence: 99%
“…Compared with intermediate representation, the downstream black-box outputs do have weaker representation capability for general learning tasks. But recent work (Yuan et al 2020a) shows that, given the same (or aligned) inputs, the executed models' outputs are very effective hints for scheduling unexecuted models. The insight is that the correlation of black-box outputs between multiple tasks with the same input is more explicit and even stronger than the intermediate features.…”
Section: Black-box Output Vs Intermediate Representationmentioning
confidence: 99%
“…We tested four types of low-level features as proposed in Reducto and selected the one that has the best performance. (4) DRLS (Deep Reinforcement Learning-based Scheduler) (Yuan et al 2020a): a multi-model scheduling approach. DRLS trains a deep reinforcement learning agent to predict the next model to execute on the given data, based on the observation of executed models' outputs.…”
Section: Evaluation Implementation and Experiments Setupmentioning
confidence: 99%
“…We evaluated MLink on two realworld video analytics systems, one for the smart building and the other for city traffic monitoring, including six visual models and 3,264 hours of video from 58 cameras. Under the budget of GPU memory, MLink outperforms baselines (multi-task learning (Crawshaw 2020), deep reinforcement learning-based scheduler (Yuan et al 2020a) and frame filtering (Li et al 2020)) and can save 66.7% inference computation while preserving 94% output accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Multi-task learning and zipping (He, Zhou, and Thiele 2018;Sanh, Wolf, and Ruder 2019;Crawshaw 2020;Zhang and Yang 2021) can reduce the computing overheads by sharing neurons among different tasks; Model compression (Hinton, Vinyals, and Dean 2015;Liu et al 2018;Gold-blum et al 2020;Bai et al 2020) techniques attempt to eliminate parameters and connections not related to the inference accuracy; Inference reusing (Guo et al 2018;Ning, Guan, and Shen 2019) approaches aim to avoid the same or similar computations; Source filtering (Li et al 2020) methods try to transmit only necessary input data to backend ML models. Adaptive configuration (Jiang et al 2018) and multi-model scheduling (Yuan et al 2020a) were proposed to make inference workloads adaptive to the dynamics of input content. We summarize them as answers to an interesting question:…”
The cost efficiency of model inference is critical to real-world machine learning (ML) applications, especially for delay-sensitive tasks and resource-limited devices. A typical dilemma is: in order to provide complex intelligent services (e.g. smart city), we need inference results of multiple ML models, but the cost budget (e.g. GPU memory) is not enough to run all of them. In this work, we study underlying relationships among black-box ML models and propose a novel learning task: model linking. Model linking aims to bridge the knowledge of different black-box models by learning mappings (dubbed model links) between their output spaces. Based on model links, we developed a scheduling algorithm, named MLink. Through collaborative multi-model inference enabled by model links, MLink can improve the accuracy of obtained inference results under the cost budget. We evaluated MLink on a multi-modal dataset with seven different ML models and two real-world video analytics systems with six ML models and 3,264 hours of video. Experimental results show that our proposed model links can be effectively built among various black-box models. Under the budget of GPU memory, MLink can save 66.7% inference computations while preserving 94% inference accuracy, which outperforms multi-task learning, deep reinforcement learning-based scheduler and frame filtering baselines.
“…For example, the performance measure can be accuracy for classification task and bounding box IoU for the detection task. Following previous efforts for optimizing the inference efficiency (Li et al 2020;Yuan et al 2020a), the performance measure the consistency between obtained results and exact inference outputs, instead of ground-truth labels. The multi-model inference under cost budget problem is formalized as:…”
Section: Problem Statementmentioning
confidence: 99%
“…Compared with intermediate representation, the downstream black-box outputs do have weaker representation capability for general learning tasks. But recent work (Yuan et al 2020a) shows that, given the same (or aligned) inputs, the executed models' outputs are very effective hints for scheduling unexecuted models. The insight is that the correlation of black-box outputs between multiple tasks with the same input is more explicit and even stronger than the intermediate features.…”
Section: Black-box Output Vs Intermediate Representationmentioning
confidence: 99%
“…We tested four types of low-level features as proposed in Reducto and selected the one that has the best performance. (4) DRLS (Deep Reinforcement Learning-based Scheduler) (Yuan et al 2020a): a multi-model scheduling approach. DRLS trains a deep reinforcement learning agent to predict the next model to execute on the given data, based on the observation of executed models' outputs.…”
Section: Evaluation Implementation and Experiments Setupmentioning
confidence: 99%
“…We evaluated MLink on two realworld video analytics systems, one for the smart building and the other for city traffic monitoring, including six visual models and 3,264 hours of video from 58 cameras. Under the budget of GPU memory, MLink outperforms baselines (multi-task learning (Crawshaw 2020), deep reinforcement learning-based scheduler (Yuan et al 2020a) and frame filtering (Li et al 2020)) and can save 66.7% inference computation while preserving 94% output accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Multi-task learning and zipping (He, Zhou, and Thiele 2018;Sanh, Wolf, and Ruder 2019;Crawshaw 2020;Zhang and Yang 2021) can reduce the computing overheads by sharing neurons among different tasks; Model compression (Hinton, Vinyals, and Dean 2015;Liu et al 2018;Gold-blum et al 2020;Bai et al 2020) techniques attempt to eliminate parameters and connections not related to the inference accuracy; Inference reusing (Guo et al 2018;Ning, Guan, and Shen 2019) approaches aim to avoid the same or similar computations; Source filtering (Li et al 2020) methods try to transmit only necessary input data to backend ML models. Adaptive configuration (Jiang et al 2018) and multi-model scheduling (Yuan et al 2020a) were proposed to make inference workloads adaptive to the dynamics of input content. We summarize them as answers to an interesting question:…”
The cost efficiency of model inference is critical to real-world machine learning (ML) applications, especially for delay-sensitive tasks and resource-limited devices. A typical dilemma is: in order to provide complex intelligent services (e.g. smart city), we need inference results of multiple ML models, but the cost budget (e.g. GPU memory) is not enough to run all of them. In this work, we study underlying relationships among black-box ML models and propose a novel learning task: model linking. Model linking aims to bridge the knowledge of different black-box models by learning mappings (dubbed model links) between their output spaces. Based on model links, we developed a scheduling algorithm, named MLink. Through collaborative multi-model inference enabled by model links, MLink can improve the accuracy of obtained inference results under the cost budget. We evaluated MLink on a multi-modal dataset with seven different ML models and two real-world video analytics systems with six ML models and 3,264 hours of video. Experimental results show that our proposed model links can be effectively built among various black-box models. Under the budget of GPU memory, MLink can save 66.7% inference computations while preserving 94% inference accuracy, which outperforms multi-task learning, deep reinforcement learning-based scheduler and frame filtering baselines.
In machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have rarely been used for operations such as matrix algebra and gradient descent. In this work, we demonstrate that SQL with recursive tables makes it possible to express a complete machine learning pipeline out of data preprocessing, model training and its validation. To facilitate the specification of loss functions, we extend the code-generating database system Umbra by an operator for automatic differentiation for use within recursive tables: With the loss function expressed in SQL as a lambda function, Umbra generates machine code for each partial derivative. We further use automatic differentiation for a dedicated gradient descent operator, which generates LLVM code to train a user-specified model on GPUs. We fine-tune GPU kernels at hardware level to allow a higher throughput and propose non-blocking synchronisation of multiple units. In our evaluation, automatic differentiation accelerated the runtime by the number of cached subexpressions compared to compiling each derivative separately. Our GPU kernels with independent models allowed maximal throughput even for small batch sizes, making machine learning pipelines within SQL more competitive.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.