2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2019
DOI: 10.1109/sbac-pad.2019.00035
|View full text |Cite
|
Sign up to set email alerts
|

AI Gauge: Runtime Estimation for Deep Learning in the Cloud

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 9 publications
0
7
0
Order By: Relevance
“…Indeed, black box approaches can derive performance models from data to make predictions without a priori knowledge about the internals of the target system. On the other hand, ML models Dao et al (2015); Barnes et al (2008); Bitirgen et al (2008); Kerr et al (2010); Lu et al (2017); Gupta et al (2018); Peng et al (2018); Dube et al (2019) require to perform an initial profiling campaign to gather training data to learn the mapping among the features of an application and its execution time. An overview and quantitative comparison among recent analytical and ML-based model proposals is reported in Madougou et al (2016).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Indeed, black box approaches can derive performance models from data to make predictions without a priori knowledge about the internals of the target system. On the other hand, ML models Dao et al (2015); Barnes et al (2008); Bitirgen et al (2008); Kerr et al (2010); Lu et al (2017); Gupta et al (2018); Peng et al (2018); Dube et al (2019) require to perform an initial profiling campaign to gather training data to learn the mapping among the features of an application and its execution time. An overview and quantitative comparison among recent analytical and ML-based model proposals is reported in Madougou et al (2016).…”
Section: Related Workmentioning
confidence: 99%
“…In a more recent work Gianniti et al (2019), we compared our per layer model in Gianniti et al (2018b) with a pure black box ML end-toend model but the study was still limited to a single DL framework and could not generalize prediction to different GPU hardware. Finally, the work in Dube et al (2019) proposed AI Gauge, a framework based on ML where models are continuously calibrated processing job traces. The proposed models achieve less than 10% relative error on average, but, however, are limited to single GPU deployments.…”
Section: Related Workmentioning
confidence: 99%
“…Monitoring. Monitoring is the key to application aware optimization [10], [26], [53], [62], [17], [63]. In order to obtain a fine-grained view of the infrastructure, Horus leverages cAdvisor 7 , a container monitoring framework.…”
Section: System Implementationmentioning
confidence: 99%
“…Part of the research focuses on the prediction of task execution time, [6][7][8][9][10] some of the research focuses on the prediction of cluster resources such as CPU load, [11][12][13] and some of the research focuses on the scheduling of workflow. In addition, some researchers use reinforcement learning 14,15 to perform workflow scheduling, which have high computational complexity and are not sensitive to changes in workflow.…”
Section: Related Workmentioning
confidence: 99%