Handling Missing Value in Decision Tree Algorithm

Patidar, Preeti; Tiwari, Anshu

doi:10.5120/12023-8063

Cited by 16 publications

(9 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…XGBoost is a library developed from the GBDT algorithm to combine multiple weak learners through the boosting method [36]. The basic algorithm is based on the Classification And Regression Trees (CART), which has high performance in both interpretability and transparency [37]. XGBoost has seven main hyper-parameters that can be adjusted to improve the algorithm's progress and robustness and also to reduce overfitting.…”

Section: Xgboostmentioning

confidence: 99%

Explanation of Machine-Learning Solutions in Air-Traffic Management

et al. 2021

View full text Add to dashboard Cite

Advances in the trusted autonomy of air-traffic management (ATM) systems are currently being pursued to cope with the predicted growth in air-traffic densities in all classes of airspace. Highly automated ATM systems relying on artificial intelligence (AI) algorithms for anomaly detection, pattern identification, accurate inference, and optimal conflict resolution are technically feasible and demonstrably able to take on a wide variety of tasks currently accomplished by humans. However, the opaqueness and inexplicability of most intelligent algorithms restrict the usability of such technology. Consequently, AI-based ATM decision-support systems (DSS) are foreseen to integrate eXplainable AI (XAI) in order to increase interpretability and transparency of the system reasoning and, consequently, build the human operators’ trust in these systems. This research presents a viable solution to implement XAI in ATM DSS, providing explanations that can be appraised and analysed by the human air-traffic control operator (ATCO). The maturity of XAI approaches and their application in ATM operational risk prediction is investigated in this paper, which can support both existing ATM advisory services in uncontrolled airspace (Classes E and F) and also drive the inflation of avoidance volumes in emerging performance-driven autonomy concepts. In particular, aviation occurrences and meteorological databases are exploited to train a machine learning (ML)-based risk-prediction tool capable of real-time situation analysis and operational risk monitoring. The proposed approach is based on the XGBoost library, which is a gradient-boost decision tree algorithm for which post-hoc explanations are produced by SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). Results are presented and discussed, and considerations are made on the most promising strategies for evolving the human–machine interactions (HMI) to strengthen the mutual trust between ATCO and systems. The presented approach is not limited only to conventional applications but also suitable for UAS-traffic management (UTM) and other emerging applications.

show abstract

Section: Xgboostmentioning

confidence: 99%

Explanation of Machine-Learning Solutions in Air-Traffic Management

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Decision tree required less data cleaning compared to some other methods as it is not affected by missing values and outliers. Data removal is preferable for small number of missing data values whereas data replacement is more appropriate for large number of missing data values [28]. Moreover, the best surrogate predictor can be used when the value of the optimal split predictor for an observation is missing.…”

Section: B) Imputementioning

confidence: 99%

Application of Decision Tree in Classifying Secondary School Students’ Tendencies to Choose TVET in Malaysia

et.al¹

2021

TURCOMAT

View full text Add to dashboard Cite

The wave of Industry Revolution (IR 4.0) highlights the importance of technology in our life. The demand for technologist and skilled workers in Technical and Vocational Education and Training (TVET) are increasing day by day due to their expertise. TVET provides a platform for formal and non-formal learning to equip the youngsters in contributing to the development of a prosperous and inclusive nation. Moreover, TVET promises bright job prospects especially in fulfilling the manpower demand of IR 4.0. However, students in Malaysia currently are not fully aware of the existence of TVET, since the number of students who join TVET are still below expectation. Therefore, the main objective in this study is to develop the best TVET model to classify the students’ tendency in choosing TVET after completing secondary school. From the literature, five main factors that hinder students’ interest in joining TVET are recognized, namely students’ interest, parents, society, TVET instructors and employers. In this study, 428 secondary school students from Kedah (Malaysia) are involved as respondents. Different types of decision tree models are developed based on the algorithms and the splitting criteria. Altogether, there are 15 variables derived from 5 main affecting factors mentioned above to determine the tendency of joining TVET. Consequently, the best TVET classifier with the misclassification rate of 0.2919 is selected, to predict the tendency of students who will be joining TVET in future. Our findings revealed that the variable of “Stream” plays as the primary and trifling roles. This classifier is beneficial in assisting the government to achieve the aim of upholding TVET in Malaysia.

show abstract

“…As well, a hybrid method developed to clean data using enhanced versions of two basic techniques namely PNRS and Transitive Closure explained in [7]. On the other hand, an educational data mining field a special system had developed and examined by the Decision Tree to solve the educational problems of datasets [8]. The missing value is one of the most problems found, so researchers use some common algorithms to solve this problem such that ID3, CART, and C4.5.…”

Section: Related Workmentioning

confidence: 99%

A hybrid Technique for Cleaning Missing and Misspelling Arabic Data in Data Warehouse

Al-Hagery¹,

Alreshoodi²,

Almutairi³

et al. 2019

IJITCS

View full text Add to dashboard Cite

Real-World datasets accumulated over a number of years tend to be incomplete, inconsistent and contain noisy data, this, in turn, will cause an inconsistency of data warehouses. Data owners are having hundred-millions to billions of records written in different languages, hence continuously increases the need for comprehensive, efficient techniques to maintain data consistency and increase its quality. It is known that the data cleaning is a very complex and difficult task, especially for the data written in Arabic as a complex language, where various types of unclean data can occur to the contents. For example, missing values, dummy values, redundant, inconsistent values, misspelling, and noisy data. The ultimate goal of this paper is to improve the data quality by cleaning the contents of Arabic datasets from various types of errors, to produce data for better analysis and highly accurate results. This, in turn, leads to discover correct patterns of knowledge and get an accurate Decision-Making. This approach established based on the merging of different algorithms. It ensures that reliable methods are used for data cleansing. This approach cleans the Arabic datasets based on the multi-level cleaning using Arabic Misspelling Detection, Correction Model (AMDCM), and Decision Tree Induction (DTI). This approach can solve the problems of Arabic language misspelling, cryptic values, dummy values, and unification of naming styles. A sample of data before and after cleaning errors presented. . Currently, he is teaching the master degree students and a supervisor of four master thesis. He is a jury member of a number of PhD and master thesis, as an internal and external examiner in his field of his specialist.

show abstract

Handling Missing Value in Decision Tree Algorithm

Cited by 16 publications

References 12 publications

Explanation of Machine-Learning Solutions in Air-Traffic Management

Explanation of Machine-Learning Solutions in Air-Traffic Management

Application of Decision Tree in Classifying Secondary School Students’ Tendencies to Choose TVET in Malaysia

A hybrid Technique for Cleaning Missing and Misspelling Arabic Data in Data Warehouse

Contact Info

Product

Resources

About