AI Gauge: Runtime Estimation for Deep Learning in the Cloud

Dube, Parijat; Suk, Tonghoon; Wang, Chen

doi:10.1109/sbac-pad.2019.00035

Cited by 14 publications

(7 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Indeed, black box approaches can derive performance models from data to make predictions without a priori knowledge about the internals of the target system. On the other hand, ML models Dao et al (2015); Barnes et al (2008); Bitirgen et al (2008); Kerr et al (2010); Lu et al (2017); Gupta et al (2018); Peng et al (2018); Dube et al (2019) require to perform an initial profiling campaign to gather training data to learn the mapping among the features of an application and its execution time. An overview and quantitative comparison among recent analytical and ML-based model proposals is reported in Madougou et al (2016).…”

Section: Related Workmentioning

confidence: 99%

“…In a more recent work Gianniti et al (2019), we compared our per layer model in Gianniti et al (2018b) with a pure black box ML end-toend model but the study was still limited to a single DL framework and could not generalize prediction to different GPU hardware. Finally, the work in Dube et al (2019) proposed AI Gauge, a framework based on ML where models are continuously calibrated processing job traces. The proposed models achieve less than 10% relative error on average, but, however, are limited to single GPU deployments.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Performance prediction of deep learning applications training in GPU as a service systems

et al. 2022

View full text Add to dashboard Cite

growth rate of over 38% to support 3D models, animated video processing, and gaming. GPUaaS adoption will be also boosted by the use of graphics processing units (GPUs) to support Deep learning (DL) model training. Indeed, nowadays, the main cloud providers already offer in their catalogs GPU-based virtual machines pre-installed with the popular DL framework (like Torch, PyTorch, TensorFlow, and Caffe) simplifying DL model programming operations.Motivated by these considerations, this paper studies GPU-deployed neural networks (NNs) and tackles the issue of performance prediction, particularly with respect to NN training times. The proposed approach is based on machine learning and exploits two main sets of features which describe, on one hand, the network architecture and the hyper-parameters, on the other, the hardware characteristics of the target deployment. Such data enable the learning of multiple linear regression models, which, coupled with an established feature selection technique, become accurate prediction tools, with errors below 11 % on average. An extensive experimental campaign, performed both on public and in-house private cloud deployments, considers popular deep NNs used for image classification and speech transcription and shows that prediction errors remain small even when extrapolating outside the range spanned by the input data. This has important implications for the models' applicability: in this way, it is possible to investigate the impact on the performance of different GPUaaS deployment or hardware upgrades even without conducting an empirical investigation on the specific target device or to evaluate the changes in training time when the number of inner modules in the deep neural networks varies.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Performance prediction of deep learning applications training in GPU as a service systems

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Monitoring. Monitoring is the key to application aware optimization [10], [26], [53], [62], [17], [63]. In order to obtain a fine-grained view of the infrastructure, Horus leverages cAdvisor 7 , a container monitoring framework.…”

Section: System Implementationmentioning

confidence: 99%

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

Yeung

Borowiec

Yang

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this paper we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model's computation graph features, removing the need for online profiling and isolated reserved GPUs. Through micro-benchmarks and job co-location combinations across heterogeneous GPU hardware, we identify GPU utilization as a general proxy metric to determine good placement decisions, in contrast to current approaches which reserve isolated GPUs to perform online profiling and directly measure GPU utilization for each unique submitted job. Our approach promotes high resource utilization and makespan reduction; via real-world experimentation and large-scale trace driven simulation, we demonstrate that Horus outperforms other DL resource managers by up to 61.5% for GPU resource utilization, 23.7-30.7% for makespan reduction and 68.3% in job wait time reduction.

show abstract

“…Part of the research focuses on the prediction of task execution time, [6][7][8][9][10] some of the research focuses on the prediction of cluster resources such as CPU load, [11][12][13] and some of the research focuses on the scheduling of workflow. In addition, some researchers use reinforcement learning 14,15 to perform workflow scheduling, which have high computational complexity and are not sensitive to changes in workflow.…”

Section: Related Workmentioning

confidence: 99%

An online workflow scheduling algorithm considering license limitation in heterogeneous environment

Qiao

Chen

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

With the development of the IC industry, electronic design automation (EDA) tools are also developing. Currently, EDA tools have been migrating to high-performance computing clusters (HPC) or the cloud to meet increasing computing and storage requirements. EDA tasks are scientific workflows, whose scheduling is a well-known NP-hard problem. In this paper, we propose a novel workflow scheduling algorithm HEWS, which achieves better performance through a novel hierarchical sorting method. We conducted a series of simulation experiments in different environments, and the experimental results show that our HEWS scheduling algorithm achieves better scheduling performance compared to several conventional scheduling methods, the waiting time and makespan of workflow are significantly reduced.

show abstract

AI Gauge: Runtime Estimation for Deep Learning in the Cloud

Cited by 14 publications

References 9 publications

Performance prediction of deep learning applications training in GPU as a service systems

Performance prediction of deep learning applications training in GPU as a service systems

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

An online workflow scheduling algorithm considering license limitation in heterogeneous environment

Contact Info

Product

Resources

About