2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) 2018
DOI: 10.1109/aike.2018.00045
|View full text |Cite
|
Sign up to set email alerts
|

MapReduce Tuning to Improve Distributed Machine Learning Performance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 4 publications
0
5
0
Order By: Relevance
“…A distributed real-time optimization method for MapReduce frameworks in emerging cloud platforms supporting dynamic speed scaling capabilities is presented in [47], capable of dynamically scheduling input data of sufficient size and synthesizing intermediate processing results based on the state of the application and the data center, and the proposed method is able to significantly improve throughput. It is shown in [48] how MapReduce parameters affect the distributed processing of machine learning programs that are supported by the Hadoop Mahout and Spark MLlib machine learning libraries. A virtualized cluster is built on Docker Containers and Hadoop parameters such as number of replicas and data block size are changed to measure DML performance.…”
Section: Return Resultsmentioning
confidence: 99%
“…A distributed real-time optimization method for MapReduce frameworks in emerging cloud platforms supporting dynamic speed scaling capabilities is presented in [47], capable of dynamically scheduling input data of sufficient size and synthesizing intermediate processing results based on the state of the application and the data center, and the proposed method is able to significantly improve throughput. It is shown in [48] how MapReduce parameters affect the distributed processing of machine learning programs that are supported by the Hadoop Mahout and Spark MLlib machine learning libraries. A virtualized cluster is built on Docker Containers and Hadoop parameters such as number of replicas and data block size are changed to measure DML performance.…”
Section: Return Resultsmentioning
confidence: 99%
“…It is another method for processing massive data that can efficiently divide and use massive resources. Also, Jeon et al [19] suggested a Hadoop performance tuning method by reducing the size of data transmitted to the network and minimizing disk I/O. Spam filtering methods are largely divided into reputation-based filtering methods and content-based filtering methods.…”
Section: Related Workmentioning
confidence: 99%
“…Hence, the name node directly forwards the jobs to a particular data node without the knowledge of the entire cluster. Jeon et al [25] show the effect of MapReduce parameters in distributed processing of machine learning program. Chung and Nah [26] showed how different virtualization methods affect the distributed processing of a massive volume of data in terms of the processing performance.…”
Section: Literature Reviewmentioning
confidence: 99%