Fuzzy decision trees are one of the most important extensions of decision trees for symbolic knowledge acquisition by fuzzy representation. Many fuzzy decision trees employ fuzzy information gain as a measure to construct the tree node splitting criteria. These criteria play a critical role in the construction of decision trees. However, many of the criteria can only work well on small-scale or medium-scale data sets, and cannot directly deal with large-scale data sets on the account of some limiting factors such as memory capacity, execution time, and data complexity.Parallel computing is one way to overcome these problems; in particular, MapReduce is one mainstream solution of parallel computing. In this paper, we design a parallel tree node splitting criterion (MR-NSC) based on fuzzy information gain via MapReduce, which is completed equivalent to the traditional unparallel splitting rule. The experimental studies verify the equivalency between the proposed MR-NSC algorithm and the traditional unparallel way through 22 UCI benchmark data sets. Furthermore, the feasibility and parallelism are also studied on two large-scale data sets.
KEYWORDSfuzzy decision trees, fuzzy information gain, MapReduce, parallel computing
INTRODUCTIONDecision trees are one of the most well-known researches to describe a decision-making process in light of existing knowledge. Each branch of a decision tree can be transformed into a decision rule, and all these decision rules can generate a decision rule base. The popularity of decision trees mainly arises from that the decision rules are more readily comprehensible than some other decision-making models such as neural networks. 1,2 Based on decision trees, there are many extensions 3 : Most of them are extensions or improvements of the well-known ID3 algorithm 4 and CART algorithm. 5 Fuzzy decision trees are one of the most popular extensions. They combine the symbolic decision trees with approximate reasoning offered by fuzzy representation. 6 The intent is to exploit the complementary advantages of the comprehensibility of decision trees and the uncertain information of fuzzy representation. 7,8Based on some splitting mechanisms, fuzzy decision trees recursively partition the training data into several subsets with some similar or same outputs in a top-down way. In particular, as one of the splitting mechanisms, fuzzy information theory makes a widespread influence in the growth of fuzzy decision trees. Weber 9 presented a well-known fuzzy Iterative Dichotomiser 3 (ID3) algorithm by modifying the information gain measure used to split a tree node for fuzzy representation. Umanol et al 10 proposed a new algorithm based on the probability of membership values to generate a fuzzy decision tree from numerical data, in which the fuzzy sets for each attribute are predefined by users. Ichihashi et al 11 realized fuzzy partitions by extracting fuzzy reasoning rules. An algebraic method to facilitate incremental learning is also employed. As knowledge inferences must be newly defined in fuz...