DJEnsemble: a Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-temporal Models

Pereira, Rafael Silva; Souto, Yania Molina; Chaves, Anderson S.; Zorilla, Rocio; Tsan, Brian; Rusu, Florin; Ogasawara, Eduardo; Ziviani, Artur; Porto, Fábio

doi:10.1145/3468791.3468806

Cited by 6 publications

(7 citation statements)

References 16 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Clipper [14] proposes a strategy to select models for ensemble inference by modeling the scenario as a multi-armed bandit problem. DJEnsemble [36] presents a cost-based approach for the automatic selection of black-box models to answer spatio-temporal queries. We do not consider such approaches in this work, mainly because they all require labeled data from the target domain for model selection, which could become a significant burden that complicates the model deployment phase.…”

Section: Related Workmentioning

confidence: 99%

Benchmark of DNN Model Search at Deployment Time

Zhou,

Jain,

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning has become the most popular direction in machine learning and artificial intelligence. However, the preparation of training data, as well as model training, are often time-consuming and become the bottleneck of the end-to-end machine learning lifecycle. Reusing models for inferring a dataset can avoid the costs of retraining. However, when there are multiple candidate models, it is challenging to discover the right model for reuse. Although there exist a number of model sharing platforms such as ModelDB, TensorFlow Hub, PyTorch Hub, and DLHub, most of these systems require model uploaders to manually specify the details of each model and model downloaders to screen keyword search results for selecting a model. We are lacking a highly productive model search tool that selects models for deployment without the need for any manual inspection and/or labeled data from the target domain. This paper proposes multiple model search strategies including various similarity-based approaches and non-similarity-based approaches. We design, implement and evaluate these approaches on multiple model inference scenarios, including activity recognition, image recognition, text classification, natural language processing, and entity matching. The experimental evaluation showed that our proposed asymmetric similarity-based measurement, adaptivity, outperformed symmetric similarity-based measurements and nonsimilarity-based measurements in most of the workloads. CCS CONCEPTS• Information systems → Clustering and classification; Evaluation of retrieval results; • Computing methodologies → Neural networks.

show abstract

Section: Related Workmentioning

confidence: 99%

Benchmark of DNN Model Search at Deployment Time

Zhou,

Jain,

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The assumption is that by combining different models, the weaknesses of each one are compensated by the strengths of the others. However, DJEnsemble takes a slightly different approach [Pereira et al 2021]. As the traditional ensemble approach, it considers a set of available trained models M = {M 1 , M 2 , .…”

Section: Djensemble Approachmentioning

confidence: 99%

“…To analyze the algorithm performance when integrated to SAVIME, we performed a series of experiments evaluating the execution time of the different steps. For reference, we also measured the offline step execution time (already presented in [Pereira et al 2021]). For our experiments, we built a dataset from rain data from the city of Rio de Janeiro, provided by 33 pluviometrical stations.…”

Section: Experimental Evaluationmentioning

confidence: 99%

“…In this paper, we present initial results on the integration of machine learning models ensembles into the SAVIME database system [L. S. Lustosa et al 2021]. We consider the DJEnsemble approach [Pereira et al 2021] that combines a set of spatio-temporal deep learning models automatically selected by a cost-based model. As in other works, we argue that database systems already offer a declarative query language to which model invocation can be easily integrated, as in SQL user defined functions [Duta and Grust 2020].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Integrating Machine Learning Model Ensembles to the SAVIME Database System

Silva¹,

Valduriez²,

Porto³

2022

Anais Estendidos Do XXXVII Simpósio Brasileiro De Banco De Dados (SBBD Estendido 2022)

Self Cite

View full text Add to dashboard Cite

The integration of machine learning algorithms into database systems has brought new opportunities in different areas from indexing to query optimization. In this paper, we describe the integration of an approach for the automatic computation of model ensembles to answer a predictive query. We have extended the SAVIME multi-dimensional array DBMS by adding a new function to its query language and implementing the selection and allocation ensemble model dataflow into the query processing component of SAVIME. We show some initial experimental results depicting its performance against a pure Python implementation of the ensemble approach. Interestingly enough the C++ implementation within SAVIME is up to 4 times faster than its competitor.

show abstract

“…Regarding massive data processing and model training, in [Mirzasoleiman 2021] are discussed techniques for dataset characterization in a reduced number of representatives elements, with data-efficient methods to extract representative subsets that generalize the full data. Finally, DJEnsemble [Pereira et al 2021] investigates the prediction of spatio-temporal phenomena using deep-learning models; leveraging statistical properties of the t.s. to generate tiles in contrast of our shape-based approach.…”

Section: Evaluation Of the Classifier For Model Selectionmentioning

confidence: 99%

A Data-Driven Model Selection Approach to Spatio-Temporal Prediction

Zorrilla

Ogasawara

Valduriez

et al. 2022

Anais Do XXXVII Simpósio Brasileiro De Banco De Dados (SBBD 2022)

Self Cite

View full text Add to dashboard Cite

Spatio-temporal Predictive Queries encompass a spatio-temporal constraint, defining a region, a target variable, and an evaluation metric. The output of such queries presents the future values for the target variable computed by predictive models at each point of the spatio-temporal region. Unfortunately, especially for large spatio-temporal domains with millions of points, training temporal models at each spatial domain point is prohibitive. In this work, we propose a data-driven approach for selecting pre-trained temporal models to be applied at each query point. The chosen approach applies a model to a point according to the training and input time series similarity. The approach avoids training a different model for each domain point, saving model training time. Moreover, it provides a technique to decide on the best-trained model to be applied to a point for prediction. In order to assess the applicability of the proposed strategy, we evaluate a case study for temperature forecasting using historical data and auto-regressive models. Computational experiments show that the proposed approach, compared to the baseline, achieves equivalent predictive performance using a composition of pre-trained models at a fraction of the total computational cost.

show abstract

DJEnsemble: a Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-temporal Models

Cited by 6 publications

References 16 publications

Benchmark of DNN Model Search at Deployment Time

Benchmark of DNN Model Search at Deployment Time

Integrating Machine Learning Model Ensembles to the SAVIME Database System

A Data-Driven Model Selection Approach to Spatio-Temporal Prediction

Contact Info

Product

Resources

About