2013
DOI: 10.5120/12023-8063
|View full text |Cite
|
Sign up to set email alerts
|

Handling Missing Value in Decision Tree Algorithm

Abstract: Nowadays all the decisions making and large data analysis is made using computer applications. In such kind of application we use the data mining techniques to analyses them. Different domains of research like management, engineering, medical, education are frequently using these techniques. Data mining in educational system is an emerging discipline that focuses on applying data mining tools and techniques on educational data. Educational data mining is used to study the data available in the educational fiel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 12 publications
0
9
0
Order By: Relevance
“…XGBoost is a library developed from the GBDT algorithm to combine multiple weak learners through the boosting method [36]. The basic algorithm is based on the Classification And Regression Trees (CART), which has high performance in both interpretability and transparency [37]. XGBoost has seven main hyper-parameters that can be adjusted to improve the algorithm's progress and robustness and also to reduce overfitting.…”
Section: Xgboostmentioning
confidence: 99%
“…XGBoost is a library developed from the GBDT algorithm to combine multiple weak learners through the boosting method [36]. The basic algorithm is based on the Classification And Regression Trees (CART), which has high performance in both interpretability and transparency [37]. XGBoost has seven main hyper-parameters that can be adjusted to improve the algorithm's progress and robustness and also to reduce overfitting.…”
Section: Xgboostmentioning
confidence: 99%
“…Decision tree required less data cleaning compared to some other methods as it is not affected by missing values and outliers. Data removal is preferable for small number of missing data values whereas data replacement is more appropriate for large number of missing data values [28]. Moreover, the best surrogate predictor can be used when the value of the optimal split predictor for an observation is missing.…”
Section: B) Imputementioning
confidence: 99%
“…As well, a hybrid method developed to clean data using enhanced versions of two basic techniques namely PNRS and Transitive Closure explained in [7]. On the other hand, an educational data mining field a special system had developed and examined by the Decision Tree to solve the educational problems of datasets [8]. The missing value is one of the most problems found, so researchers use some common algorithms to solve this problem such that ID3, CART, and C4.5.…”
Section: Related Workmentioning
confidence: 99%