Kendall Giles scite author profile

Many decision tree (DT) induction algorithms, including the popular C4.5 family, are based on the Conditional Entropy (CE) measure family. An interesting question involves the relative performance of other entropy measure families such as Class-Attribute Mutual Information (CAMI). We therefore conducted a theoretical analysis of the CAMI family that enabled us to expose relationships with CE and correct a previous CAMI result. Our computational study showed that there was only a small variation in the performance of the two families.Since feature selection is important in DT induction, we conducted a theoretical analysis of a recently published blurring-based feature selection algorithm and developed a new feature selection algorithm. We tested this algorithm on a wider set of test problems than in the comparable study in order to identify benefits and limitations of blurring-based feature selection.These results provide theoretical and computational insight into entropy-based induction measures and feature selection algorithms.

show abstract

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Osei-Bryson

Giles

2006

Inf Syst Front

View full text Add to dashboard Cite

Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the Conditional Entropy (CE) family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore the relative performance of the Conditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methods may be more appropriate than the popular Gain Ratio (GR) method for datasets which have nominal predictor attributes, and are competitive with the GR method for those datasets where all predictor attributes are numeric. Given that it is never known beforehand which splitting method will lead to the best DT for a given dataset, and given the relatively good performance of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data mining toolsets.

show abstract

Iterative Denoising

et al. 2007

View full text Add to dashboard Cite

show abstract

An Exploration of a Set Entropy-Based Hybrid Splitting Methods for Decision Tree Induction

Osei-Bryson

Giles

2004

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kendall Giles

Exploration of a hybrid feature selection algorithm

Comparison of two families of entropy-based classification measures with and without feature selection

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Iterative Denoising

An Exploration of a Set Entropy-Based Hybrid Splitting Methods for Decision Tree Induction

Contact Info

Product

Resources

About