XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

Li, Cheng; Dakkak, Abdul; Xiong, Jinjun; Wei, Wei; Xu, Lingjie; Hwu, Wen-mei W.

doi:10.1109/ipdps47924.2020.00042

Cited by 25 publications

(17 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent researches can be largely categorized into two kinds. The first deeply studies software and hardware issues that may affect operator execution time, and consider these issues to do estimation [7,17]. While Talos [26] and DUET [32] mainly concentrate on hardware speedup issues.…”

Section: Related Workmentioning

confidence: 99%

EOP: efficient operator partition for deep learning inference over edge servers

Zhang

et al. 2022

Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

View full text Add to dashboard Cite

Recently, Deep Learning (DL) models have demonstrated great success for its attractive ability of high accuracy used in artificial intelligence Internet of Things applications. A common deployment solution is to run such DL inference tasks on edge servers. In a DL inference, each operator takes tensors as input and run in a tensor virtual machine, which isolates resource usage among operators. Nevertheless, existing edge-based DL inference approaches can not efficiently use heterogeneous resources (e.g., CPU and low-end GPU) on edge servers and result in sub-optimal DL inference performance, since they can only partition operators in a DL inference with equal or fixed ratios. It is still a big challenge to support partition optimizations over edge servers for a wide range of DL models, such as Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and Transformers.In this paper, we present EOP, an Efficient Operator Partition approach to optimize DL inferences over edge servers, to address this challenge. Firstly, we carry out a large-scale performance evaluation on operators running on heterogeneous resources, and reveal that many operators do not follow similar performance variation when input tensors

show abstract

Section: Related Workmentioning

confidence: 99%

EOP: efficient operator partition for deep learning inference over edge servers

Zhang

et al. 2022

Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

View full text Add to dashboard Cite

show abstract

“…Researchers and engineers often have to implement this functionality for each new project from scratch, which can become very complicated particularly when targeting new hardware, embedded devices, TinyML and IoT. -Machine learning benchmarking initiatives such as MLPerf [16], MLModelScope [17] and Deep500 [18] attempt to standardize machine learning model benchmarking and make it more reproducible. However, production deployment, integration with complex systems and adaptation to continuously changing user environments, platforms, tools and data are currently out of their scope.…”

Section: Collective Knowledge Conceptmentioning

confidence: 99%

“…They are very useful for data scientists but do not yet provide a universal mechanism to automatically build and run algorithms across different platforms, environments, libraries, tools, models and datasets. Researchers and engineers often have to implement this functionality for each new project from scratch, which can become very complicated particularly when targeting new hardware, embedded devices, TinyML and IoT.Machine learning benchmarking initiatives such as MLPerf [16], MLModelScope [17] and Deep500 [18] attempt to standardize machine learning model benchmarking and make it more reproducible. However, production deployment, integration with complex systems and adaptation to continuously changing user environments, platforms, tools and data are currently out of their scope.Package managers such as Spack [19] and EasyBuild [20] are very useful for rebuilding and fixing the whole software environment.…”

Section: Collective Knowledge Conceptmentioning

confidence: 99%

“…Machine learning benchmarking initiatives such as MLPerf [16], MLModelScope [17] and Deep500 [18] attempt to standardize machine learning model benchmarking and make it more reproducible. However, production deployment, integration with complex systems and adaptation to continuously changing user environments, platforms, tools and data are currently out of their scope.…”

Section: Collective Knowledge Conceptmentioning

confidence: 99%

See 1 more Smart Citation

Collective knowledge: organizing research projects as a database of reusable components and portable workflows with common interfaces

Fursin¹

2021

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

This article provides the motivation and overview of the Collective Knowledge Framework (CK or cKnowledge). The CK concept is to decompose research projects into reusable components that encapsulate research artifacts and provide unified application programming interfaces (APIs), command-line interfaces (CLIs), meta descriptions and common automation actions for related artifacts. The CK framework is used to organize and manage research projects as a database of such components. Inspired by the USB ‘plug and play’ approach for hardware, CK also helps to assemble portable workflows that can automatically plug in compatible components from different users and vendors (models, datasets, frameworks, compilers, tools). Such workflows can build and run algorithms on different platforms and environments in a unified way using the customizable CK program pipeline with software detection plugins and the automatic installation of missing packages. This article presents a number of industrial projects in which the modular CK approach was successfully validated in order to automate benchmarking, auto-tuning and co-design of efficient software and hardware for machine learning and artificial intelligence in terms of speed, accuracy, energy, size and various costs. The CK framework also helped to automate the artifact evaluation process at several computer science conferences as well as to make it easier to reproduce, compare and reuse research techniques from published papers, deploy them in production, and automatically adapt them to continuously changing datasets, models and systems. The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge, best practices, artifacts, workflows and experimental results in a common, portable and reproducible format at cKnowledge.io . This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico ’.

show abstract

“…MLModelScope supports different combinations of DL models, frameworks, and hardware devices, allows scalable evaluation, and reports informative benchmarking results. On GPU devices, MLModelScope further provides an automated analysis tool, called XSP [99], to build an integrated view of various performance-related metrics of workloads across the entire stack. Another recent benchmarking platform is called ML-Commons [100], which provides benchmarking, datasets, and practical innovative ML models.…”

Section: Scalable Benchmarking Of Models Sw and Hwmentioning

confidence: 99%

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign

et al. 2021

Self Cite

View full text Add to dashboard Cite

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives. Empowered by edge computing, AI workloads are migrating from centralized cloud architectures to distributed edge systems, introducing a new paradigm called edge AI. While edge AI has the promise of bringing significant increases in autonomy and intelligence into everyday lives through common edge devices, it also raises new challenges, especially for the development of its algorithms and the deployment of its services, which call for novel design methodologies catered to these unique challenges. In this paper, we provide a comprehensive survey of the latest enabling design methodologies that span the entire edge AI development stack. We suggest that the key methodologies for effective edge AI development are single-layer specialization and cross-layer co-design. We discuss representative methodologies in each category in detail, including on-device training methods, specialized software design, dedicated hardware design, benchmarking and design automation, software/hardware co-design, software/compiler co-design, and compiler/hardware co-design. Moreover, we attempt to reveal hidden cross-layer design opportunities that can further boost the solution quality of future edge AI and provide insights into future directions and emerging areas that require increased research focus.

show abstract

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

Cited by 25 publications

References 2 publications

EOP: efficient operator partition for deep learning inference over edge servers

EOP: efficient operator partition for deep learning inference over edge servers

Collective knowledge: organizing research projects as a database of reusable components and portable workflows with common interfaces

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign

Contact Info

Product

Resources

About