2019
DOI: 10.1109/access.2019.2899985
|View full text |Cite
|
Sign up to set email alerts
|

Using Machine Learning Ensemble Methods to Predict Execution Time of e-Science Workflows in Heterogeneous Distributed Systems

Abstract: Effective planning and optimized execution of the e-Science workflows in distributed systems, such as the Grid, need predictions of execution times of the workflows. However, predicting the execution times of e-Science workflows in heterogeneous distributed systems is a challenging job due to the complex structure of workflows, variations due to input problem-sizes, and heterogeneous and dynamic nature of the shared resources. To this end, we propose two novel workflow execution time-prediction methods based o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 58 publications
0
14
0
Order By: Relevance
“…Dynamic selection approaches [9], [12], [13], [17] for time series forecasting have been proposed aiming to improve the accuracy of MPS. These approaches are based on [23] that seminally proposed a dynamic selection approach for regression tasks named herein to as Dynamic Selection by Local Accuracy (DS-LA).…”
Section: Problem Definitionmentioning
confidence: 99%
See 2 more Smart Citations
“…Dynamic selection approaches [9], [12], [13], [17] for time series forecasting have been proposed aiming to improve the accuracy of MPS. These approaches are based on [23] that seminally proposed a dynamic selection approach for regression tasks named herein to as Dynamic Selection by Local Accuracy (DS-LA).…”
Section: Problem Definitionmentioning
confidence: 99%
“…For each w t of the out-of-sample, two regions of competence ∆ LT t and ∆ P R t are generated. Each region of competence is composed of 10 windows (k = 10), where ∆ LT t = (w m1 , w n2 , ..., w ik ) represent the windows from in-sample selected by the literature assumption [9], [12], [13], [17] (using Euclidean distance), and ∆ P R t = (w t−1 , ..., w t−k ) are the k closest windows to w t selected by the proposed assumption.…”
Section: Problem Definitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Table 1 summarizes major similarities and differences between our work and the existing studies. First, the examined papers concern various application domains: load sharing facility (LSF) [8], parallel program [9], [11], [25], cloud [10], [29], HPC [2], [14], [15], [19], location-based services [20]- [23], databases [26]- [28], big data applications [29], [30], and scientific workloads [12], [13], [16]- [19]. The runtime estimation problem addressed in this paper applies to the scientific workloads domain.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper we present a novel "ensemble" of clustering, regression, and classification to construct a runtime estimation model. Unlike previous works [13], [18] which take a similar approach, the biggest difference in our work is to apply clustering to the input data in the early training process, resulting in estimation quality enhancement.…”
Section: Our Approachmentioning
confidence: 99%