Yashuang Mu scite author profile

Summary In the supervised classification, large training data are very common, and decision trees are widely used. However, as some bottlenecks such as memory restrictions, time complexity, or data complexity, many supervised classifiers including classical C4.5 tree cannot directly handle big data. One solution for this problem is to design a highly parallelized learning algorithm. Motivated by this, we propose a parallelized C4.5 decision tree algorithm based on MapReduce (MR‐C4.5‐Tree) with 2 parallelized methods to build the tree nodes. First, an information entropy‐based parallelized attribute selection method (MR‐A‐S) on several subsets for MR‐C4.5‐Tree is proposed to confirm the best splitting attribute and the cut points. Then, a data splitting method (MR‐D‐S) in parallel is presented to partition the training data into subsets. At last, we introduce the MR‐C4.5‐Tree learning algorithm that grows in a top‐down recursive way. Besides, the depth of the constructed decision tree, the number of samples and the maximal class probability in each tree node are used as the termination conditions to avoid the over‐partitioning problem. Experimental studies show the feasibility and the good performance of the proposed parallelized MR‐C4.5‐Tree algorithm.

show abstract

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

Zhao

Wang

et al. 2021

Entropy

View full text Add to dashboard Cite

Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments’ results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.

show abstract

A fast rank mutual information based decision tree and its implementation via Map‐Reduce

Wang

Liu

2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary To address the time‐consuming problem for the confirmation of splitting attributes and splitting points in classic rank mutual information based decision trees, this paper establishes a fast rank mutual information based decision tree (FRMIDT) for classification problems. First, the proposed FRMIDT algorithm improves the velocity by a max‐relevance and min‐redundancy criterion to remove the redundant attributes in each tree node building. Then, the fuzzy c‐means algorithm is employed to confirm the splitting points for further acceleration. Meanwhile, a parallel implementation is developed in the framework of Map‐Reduce (MR‐FRMIDT) for medium or large‐scale data classification. Several comparative studies are conducted on UCI benchmark data sets. In contrast to the classic rank mutual information based decision tree on 12 data sets, the proposed FRMIDT model effectively reduces the computational time on the premise of keeping testing accuracy. Furthermore, the proposed FRMIDT algorithm is comparable through comparing FRMIDT with other traditional decision tree classifiers including BFT, C4.5, LAD, NBT, and SC. Meanwhile, the comparison with 7 different popular splitting measures based monotonic decision trees on several data sets illustrates the effectiveness of FRMIDT in monotonic classification. At last, the experimental analysis on other 6 data sets shows that the proposed MR‐FRMIDT is feasible and has a good parallel performance on reducing execution time and avoiding memory restrictions.

show abstract

iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model

Zhang

Wang

et al. 2020

Interdiscip Sci Comput Life Sci

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yashuang Mu

A Pearson’s correlation coefficient based decision tree and its parallel implementation

A parallel C4.5 decision tree algorithm based on MapReduce

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

A fast rank mutual information based decision tree and its implementation via Map‐Reduce

iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model

Contact Info

Product

Resources

About