Text mining refers to the process of deriving high quality information from text. It is used in search engine, digital libraries, fraud detection, and other applications that handles text data. Text mining tasks include text classification, text clustering, entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling. Classification of objects into pre-defined categories based on their features is a widely studied problem. It aims to employ labeled training data set to build a classification model based on other attributes, such that the model can be used to classify new data according to their class labels. The decision tree-based classification is one of the most practical and effective methods that uses inductive learning. It is implemented serially or in parallel, depending on data set size. Some of the classifiers such as SLIQ, SPRINT and Rainforest can be implemented serially or parallel. ID 3, CART and C4.5 are serial classifiers. In this paper, we review various decision tree algorithms with their limitations, and conduct a comparative study to evaluate their performance regarding accuracy, learning time and tree size, using four sample datasets. We found out that Random Forest classifier is the most accurate one among other classifiers. However, the increase of the dataset size and its attributes, the more the learning time and tree size, and vise versa.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.