Improvement of ID3 Algorithm Based on Simplified Information Entropy and Coordination Degree

Wang, Yingying; Li, Yibin; Song, Yun Heub; Rong, Xuewen; Zhang, Shuaishuai

doi:10.3390/a10040124

Cited by 23 publications

(10 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For a dataset with one class label, will be 1 and ( ) is 0. Hence the Entropy of homogenous data set is zero [8]. If the entropy is higher the uncertainty/impurity/mixing is higher [9].…”

Section: A Entropymentioning

confidence: 99%

See 1 more Smart Citation

Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*

Tangirala¹

2020

IJACSA

107

View full text Add to dashboard Cite

Decision tree is a supervised machine learning algorithm suitable for solving classification and regression problems. Decision trees are recursively built by applying split conditions at each node that divides the training records into subsets with output variable of same class. The process starts from the root node of the decision tree and progresses by applying split conditions at each non-leaf node resulting into homogenous subsets. However, achieving pure homogenous subsets is not possible. Therefore, the goal at each node is to identify an attribute and a split condition on that attribute that minimizes the mixing of class labels, thus resulting into nearly pure subsets. Several splitting indices were proposed to evaluate the goodness of the split, common ones being GINI index and Information gain. The aim of this study is to conduct an empirical comparison of GINI index and information gain. Classification models are built using decision tree classifier algorithm by applying GINI index and Information gain individually. The classification accuracy of the models is estimated using different metrics such as Confusion matrix, Overall accuracy, Per-class accuracy, Recall and Precision. The results of the study show that, regardless of whether the dataset is balanced or imbalanced, the classification models built by applying the two different splitting indices GINI index and information gain give same accuracy. In other words, choice of splitting indices has no impact on performance of the decision tree classifier algorithm.

show abstract

“…For a dataset with one class label, will be 1 and ( ) is 0. Hence the Entropy of homogenous data set is zero [8]. If the entropy is higher the uncertainty/impurity/mixing is higher [9].…”

Section: A Entropymentioning

confidence: 99%

“…The feature with highest information gain is the best feature to be selected for split. Assuming that there are V different values for a feature f, |L v | represents the subset of L with f=v, Information gain after splitting L on a feature f is measured as [8].…”

Section: B Information Gainmentioning

confidence: 99%

Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*

Tangirala¹

2020

IJACSA

107

View full text Add to dashboard Cite

show abstract

“…Pemodelan klasifikasi adalah proses ekstraksi data kesesuaian lahan yang telah ada [18], yang dalam algoritma SDT menggunakan entropi untuk menumbuhkan pohon keputusan. Berdasarkan hal tersebut, suatu variabel dalam suatu dataset yang memiliki sifat heterogenitas yang tinggi menyebakan nilai entropinya tinggi, dan sebaliknya jika data dalam suatu variabel bersifat homogen maka nilai entropinya akan rendah atau bahkan bernilai 0 [28]. Nilai entropi yang tinggi pada suatu variabel menyebabkan variabel tersebut memiliki peluang yang kecil untuk dijadikan sebagai simpul akar/ internal, seperti halnya variabel drainase dan kapasitas tukar kation.…”

Section: Evaluasi Hasil Klasifikasiunclassified

Optimization for prediction model of palm oil land suitability using spatial decision tree algorithm

Nurkholis

Sitanggang

2020

Jurnal Teknologi dan Sistem Komputer

View full text Add to dashboard Cite

Land suitability evaluation has a vital role in land use planning aimed to increase food production effectiveness. Palm oil is a leading and strategic commodity for Indonesian people, which is predicted consumption will exceed production in the future. This study aims to evaluate palm oil land suitability using a spatial decision tree algorithm that is conventional decision tree modification for spatial data classification with adding spatial join relation. The spatial dataset consists of eight explanatory layers (soil nature and characteristics), and a target layer (palm oil land suitability) in Bogor District, Indonesia. This study produced three models, where the best model was obtained based on optimizing accuracy (98.18 %) and modeling time (1.291 seconds). The best model has 23 rules, soil texture as the root node, two variables (drainage and cation exchange capacity) are uninvolved, with land suitability visualization obtains percentage S2 (29.94 %), S3 (53.16 %), N (16.57 %), and water body (0.33 %).

show abstract

“…A hyperparameter is an internal parameter of a classifier method, such as the box constraint of DT or a support vector machine, or maybe the learning rate of a robust classification ensemble. The goal of these parameters can highly affect the performance of a classifier; BO uses a fit function [9]. For Model selection, depend on Hyper-parameters, features, train a classifier like a decision tree, using a dataset, this process needs to choose the best feature set and hyperparameters by applying K fold DT.…”

Section: Dt Algorithms With Hyperparameter Optimizationmentioning

confidence: 99%

Improvement of Criminal Identification by Smart Optimization Method

2019

View full text Add to dashboard Cite

Data-mining methods, which can be optimized via different methods, are applied in crime detection. This work, the decision tree algorithm is used for classifying and optimizing its structure with the smart method. This method is applied to two datasets: Iraq and India criminals. The goal of the proposed method is to identify criminals using a mining method based on smart search. This contribution helps in the acquisition of better results than those provided by traditional mining methods via controlling the size of the tree through decreasing leaf size.

show abstract

Improvement of ID3 Algorithm Based on Simplified Information Entropy and Coordination Degree

Cited by 23 publications

References 32 publications

Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*

Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*

Optimization for prediction model of palm oil land suitability using spatial decision tree algorithm

Improvement of Criminal Identification by Smart Optimization Method

Contact Info

Product

Resources

About