2017
DOI: 10.1002/cpe.4015
|View full text |Cite
|
Sign up to set email alerts
|

A parallel C4.5 decision tree algorithm based on MapReduce

Abstract: Summary In the supervised classification, large training data are very common, and decision trees are widely used. However, as some bottlenecks such as memory restrictions, time complexity, or data complexity, many supervised classifiers including classical C4.5 tree cannot directly handle big data. One solution for this problem is to design a highly parallelized learning algorithm. Motivated by this, we propose a parallelized C4.5 decision tree algorithm based on MapReduce (MR‐C4.5‐Tree) with 2 parallelized m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
15
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…33 Each Map phase takes a small file divided from the original file as its inputs; meanwhile, each Map phase contains a Map function. 20 Split 0 The two type functions take < key, value > pairs as their inputs and outputs.…”
Section: The Mechanics Of Mapreducementioning
confidence: 99%
“…33 Each Map phase takes a small file divided from the original file as its inputs; meanwhile, each Map phase contains a Map function. 20 Split 0 The two type functions take < key, value > pairs as their inputs and outputs.…”
Section: The Mechanics Of Mapreducementioning
confidence: 99%
“…Then, these intermediate outputs are combined in the Reduce phase via a reduce function by some way to export the final results . Figure presents the detailed processing procedure of Map‐Reduce framework …”
Section: Related Workmentioning
confidence: 99%
“…When the best splitting attribute Akand the splitting point ck are confirmed by Algorithm 4, we need to split the data set into subsets in parallel for large‐scale data sets, which can follow the MR‐D‐S algorithm . In the following, how to construct an MR‐FRMIDT is introduced.…”
Section: The Parallel Fast Rank Mutual Information Based Decision Treementioning
confidence: 99%
See 1 more Smart Citation
“…4,18,19 A lot of work is available in the past, where the MapReduce-Hadoop has been optimized. 21 The major MapReduce scheduling algorithms, such as FIFO, 22 Matchmaking and Delay, 23 and Multithreading locality (MTL), 24 improves the efficiency of MapReduce processing on virtualized infrastructure. 20 A parallel MapReduce version of the serial C4.5 decision tree learning algorithm (MR-C4.5) proves high speed-up and scalability.…”
mentioning
confidence: 99%