On optimizing operator fusion plans for large-scale machine learning in systemML

Böehm, Matthias; Reinwald, Berthold; Hutchison, Dylan; Sen, Prithviraj; Evfimievski, Alexandre V.; Pansare, Niketan

doi:10.14778/3229863.3229865

Cited by 42 publications

(22 citation statements)

References 74 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lara currently does not apply fusion of linear algebra operators and UDFs applications, as our current dense (BLAS) and sparse (Breeze) backends do not support fused operators. Future work could extend our optimizations on data layout access patterns to generate kernels for sparse linear algebra operations with UDF support and hardware-efficient code by integrating ideas from recent work [37,12,43,16]. Furthermore, one could extend the combinator view by integrating more data representations (e.g., block-wise or compressed [24]).…”

Section: Resultsmentioning

confidence: 99%

“…The operator fusion over loops, presented in Section 4.2, detects independent tasks (e.g., encoding of distinct columns), but fuses them instead of executing them in parallel. SystemML also performs operator fusion [23,12] and generates linear algebra kernels based on skeleton classes. During a cost-based selection, the best plan with regards to fusion and caching for pipeline breakers is chosen.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

An intermediate representation for optimizing machine learning pipelines

et al. 2019

View full text Add to dashboard Cite

Machine learning (ML) pipelines for model training and validation typically include preprocessing, such as data cleaning and feature engineering, prior to training an ML model. Preprocessing combines relational algebra and user-defined functions (UDFs), while model training uses iterations and linear algebra. Current systems are tailored to either of the two. As a consequence, preprocessing and ML steps are optimized in isolation. To enable holistic optimization of ML training pipelines, we present Lara, a declarative domainspecific language for collections and matrices. Lara's intermediate representation (IR) reflects on the complete program, i.e., UDFs, control flow, and both data types. Two views on the IR enable diverse optimizations. Monads enable operator pushdown and fusion across type and loop boundaries. Combinators provide the semantics of domainspecific operators and optimize data access and cross-validation of ML algorithms. Our experiments on preprocessing pipelines and selected ML algorithms show the effects of our proposed optimizations on dense and sparse data, which achieve speedups of up to an order of magnitude.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

An intermediate representation for optimizing machine learning pipelines

et al. 2019

View full text Add to dashboard Cite

show abstract

“…4.3.1 Overall Idea. Optimal fusion plan generation requires a large search space [8,22] and has been shown to be NP-complete [15,32]. To keep the process at manageable costs, DNNFusion explores fusion plans by employing a new light-weight (greedy) approach based on our proposed Extended Computational Graph (ECG) IR and our classification of operations into mapping types.…”

Section: Light-weight Profile-driven Fusion Plan Explorationmentioning

confidence: 99%

DNNFusion: accelerating deep neural networks execution with advanced operator fusion

Niu

Guan

Wang

et al. 2021

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-propertybased graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types

show abstract

“…VM types. Considering there are more than one hundred VM types in today's public clouds, such as Amazon, Google and Microsoft, we choose 120 enterprise-level VM types of x86 architecture from Amazon EC2 5 . Note that, in Amazon EC2, there are VM Category and VM Family on top of VM type to identify the resource characteristics.…”

Section: Evaluation 51 Experiments Setupmentioning

confidence: 99%

“…To address this challenge, existing performance modeling efforts [21,25,29] and machine learning approaches [4,18,28] have to tolerate huge offline training overhead to build an accurate online model for each framework, since they just consider low-level metrics (such as resource utilizations) within a framework. Sadly, they have to spend a lot of time to train new models for similar applications for new frameworks, although recent works [3,5,10] have proved that these similar applications, both in Hadoop and Spark, involve a wide range of use cases (micro benchmark, machine learning, stream processing and etc.). Figure 1 shows an example why we need to tolerate huge offline overhead for a new framework.…”

Section: Introductionmentioning

confidence: 99%

Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning

et al. 2021

50th International Conference on Parallel Processing

View full text Add to dashboard Cite

Cloud providers are presented with a bewildering choice of VM types for a range of contemporary data processing frameworks today. However, existing performance modeling and machine learning efforts cannot pick optimal VM types for multiple frameworks simultaneously, since they are difficult to balance model accuracy and model training cost.We propose Vesta, a novel transfer learning approach, to address this challenge: (1) it abstracts knowledge of VM type selection through offline benchmarking on multiple frameworks;(2) it employs a two-layer bipartite graph to represent knowledge across frameworks; (3) it minimizes training overhead by reusing the knowledge to select the best VM type for given applications. Comparing with state-of-the-art efforts, our experiments on 30 applications of Hadoop, Hive and Spark show that Vesta can improve application performance up to 51% while reducing 85% training overhead. CCS CONCEPTS• General and reference → Performance; • Computing methodologies → Transfer learning; • Software and its engineering → Cloud computing; • Social and professional topics → Pricing and resource allocation.

show abstract

On optimizing operator fusion plans for large-scale machine learning in systemML

Cited by 42 publications

References 74 publications

An intermediate representation for optimizing machine learning pipelines

An intermediate representation for optimizing machine learning pipelines

DNNFusion: accelerating deep neural networks execution with advanced operator fusion

Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning

Contact Info

Product

Resources

About