2010
DOI: 10.20965/jaciii.2010.p0297
|View full text |Cite
|
Sign up to set email alerts
|

Data Cleaning for Classification Using Misclassification Analysis

Abstract: In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could sometime affect the classification accuracies of a model. In this paper, we investigate the use of misclassification analys… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 74 publications
(37 citation statements)
references
References 13 publications
0
37
0
Order By: Relevance
“…The main aim of the The ANN model cleans the input data automatically. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data [53]. The number of the input Processing Elements (PEs) was set to 75 (equal to the match attributes) and the output PEs was one (representing either win or lose).…”
Section: The Ann Training and Testing Proceduresmentioning
confidence: 99%
“…The main aim of the The ANN model cleans the input data automatically. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data [53]. The number of the input Processing Elements (PEs) was set to 75 (equal to the match attributes) and the output PEs was one (representing either win or lose).…”
Section: The Ann Training and Testing Proceduresmentioning
confidence: 99%
“…For that, some of the data cleaning steps are applied. These steps are very important to have high-quality datasets because unclean data can decrease the classification or regression model accuracies [42]. Fig.…”
Section: B Data Preprocessingmentioning
confidence: 99%
“…In order to apply CMTNN to perform under-sampling [16], Truth NN and Falsity NN are employed to detect and remove misclassification patterns from a training set in the following steps:…”
Section: Target Outputsmentioning
confidence: 99%