2017
DOI: 10.1177/1094342017712976
|View full text |Cite
|
Sign up to set email alerts
|

Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink

Abstract: With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on large data sets as well as they need to be executed with minimal time in order to extract useful information in a time-constrained environment. Message passing interface (MPI) is a widely used model for developing such algorithms in high-performance computing parad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
21
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 28 publications
(23 citation statements)
references
References 18 publications
1
21
0
1
Order By: Relevance
“…With our previous work 9 we have observed that various decisions made at different components of a big data runtime determine the type of applications that can be executed efficiently. The layered architecture proposed in this work will eliminate the monolithic designs and empower components to be developed independently and efficiently.…”
Section: Discussionmentioning
confidence: 95%
See 1 more Smart Citation
“…With our previous work 9 we have observed that various decisions made at different components of a big data runtime determine the type of applications that can be executed efficiently. The layered architecture proposed in this work will eliminate the monolithic designs and empower components to be developed independently and efficiently.…”
Section: Discussionmentioning
confidence: 95%
“…AMT systems mostly focus on computationally intensive applications, and there is ongoing research to make them more efficient and productive. We find that big data systems developed according to a dataflow model are inefficient in computationally intensive applications with tightly synchronized parallel operations 9 , while AMT systems are not optimized for data processing.…”
Section: Introductionmentioning
confidence: 97%
“…Flink uses thread-based worker model for executing the data flow graphs. It can chain consecutive tasks in the workflow in a single node to make the run more efficient by reducing data serializations and communications [37]. Flink and Spark are designed to make Hadoop scalable and fault-tolerant, and to analyze intensive data applications with distributed memory framework.…”
Section: Flinkmentioning
confidence: 99%
“…Frameworks for parallel data analysis have been created by the High Performance Computing (HPC) and Big Data communities [17]. MPI is the most used programming model for HPC resources.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, the MapReduce [7] abstraction makes it easy to exploit data-parallelism as required by many analysis applications. Several recent publications applied HPC techniques to advance traditional Big Data applications and Big Data frameworks [17].…”
Section: Introductionmentioning
confidence: 99%