2007 International Conference on Computational Intelligence and Security (CIS 2007) 2007
DOI: 10.1109/cis.2007.7
|View full text |Cite
|
Sign up to set email alerts
|

Mining with Noise Knowledge: Error Aware Data Mining

Abstract: Abstract-Real-world data mining deals with noisy information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing approaches to enhance the data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performance. In this paper, we con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0
1

Year Published

2010
2010
2021
2021

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 22 publications
(22 citation statements)
references
References 27 publications
0
21
0
1
Order By: Relevance
“…The issue of data quality or veracity has been considered by a number of researchers [39], including data complexity [9], missing values [19], noise [58], imbalance [13], and dataset shift [39]. The latter, dataset shift, is most profound in the case of big data as the unseen data may present a distribution that is not seen in the training data.…”
Section: Data Mining/science With Big Datamentioning
confidence: 99%
“…The issue of data quality or veracity has been considered by a number of researchers [39], including data complexity [9], missing values [19], noise [58], imbalance [13], and dataset shift [39]. The latter, dataset shift, is most profound in the case of big data as the unseen data may present a distribution that is not seen in the training data.…”
Section: Data Mining/science With Big Datamentioning
confidence: 99%
“…Generally, noisy data in the classification problems could be organized in three groups [10][11][12][13][14]. i) Data that their corresponding labels include noise (paradoxical labeling error for a data point or misclassifications errors .…”
Section: Introductionmentioning
confidence: 99%
“…It is expected that the whole process starts with raw data and finishes with the extracted knowledge. Because of its data-driven nature, previous research efforts have concluded that data mining results crucially rely on the quality of the underlying data, and for most of the data mining applications, the process of data collection, data preparation, and data enhancement cost the majority of the project budget and also the developing time circle [18].…”
Section: Study Of the Certainty In The Training Samplesmentioning
confidence: 99%