2021
DOI: 10.1515/jisys-2020-0061
|View full text |Cite
|
Sign up to set email alerts
|

Instance Reduction for Avoiding Overfitting in Decision Trees

Abstract: Decision trees learning is one of the most practical classification methods in machine learning, which is used for approximating discrete-valued target functions. However, they may overfit the training data, which limits their ability to generalize to unseen instances. In this study, we investigated the use of instance reduction techniques to smooth the decision boundaries before training the decision trees. Noise filters such as ENN, RENN, and ALLKNN remove noisy instances while DROP3 and DROP5 may remove gen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 19 publications
0
8
0
1
Order By: Relevance
“…Given the model’s architecture, if the model is allowed to be trained to its full power, the model is practically guaranteed to overfit the training data. Fortunately, overfitting in machine learning algorithms may be avoided and prevented using a number of different methods [ 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Some methods that are frequently employed to prevent overfitting in decision trees are as follows: Pre-pruning.…”
Section: Resultsmentioning
confidence: 99%
“…Given the model’s architecture, if the model is allowed to be trained to its full power, the model is practically guaranteed to overfit the training data. Fortunately, overfitting in machine learning algorithms may be avoided and prevented using a number of different methods [ 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Some methods that are frequently employed to prevent overfitting in decision trees are as follows: Pre-pruning.…”
Section: Resultsmentioning
confidence: 99%
“…Though a commonly used tool in data mining for deriving a strategy to reach a particular goal, it is also widely used in machine learning (Dietterich and Kong, 1995; Navada et al , 2011; Somvanshi et al , 2016). A disadvantage of a decision tree model is that the likelihood of overfitting to the data tends to increase as the size and complexity of the tree grows (Al-Akhras et al , 2021). The decision tree model, however, is advantageous, in the sense that it can be used for both classification and regression problems.…”
Section: Methodsmentioning
confidence: 99%
“…The two decision tree regression models (with and without lagged variables) give the same prediction of housing prices for several different actual values, which compromises the accuracy of this method. As the likelihood of overfitting to the data positively correlates with an increase in the size and complexity of the decision tree (Al-Akhras et al , 2021), decision trees can be prone to overfitting, which can undermine model validity. As such, it is found that random forest regression is a better choice in housing pricing modelling.…”
Section: Housing Price Modellingmentioning
confidence: 99%
“…Such an algorithm relies on the training data quality and its accuracy decreases around decision boundaries. Noisy instances, a small number of training examples and over-learning are some of the main reasons that could lead to poor performance [39] and overfitting.…”
Section: Decision Treesmentioning
confidence: 99%