2019
DOI: 10.1016/j.ins.2018.12.002
|View full text |Cite
|
Sign up to set email alerts
|

Enabling Smart Data: Noise filtering in Big Data classification

Abstract: In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of data. However, in this Big Data era, the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
56
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 128 publications
(61 citation statements)
references
References 42 publications
0
56
0
Order By: Relevance
“…As we have indicated in Section 4.2, noise filtering is a popular option in these cases, which becomes even more helpful in Big Data environments as noise filters reduce the size of the datasets. However, designing Big Data noise filters is a challenge and only some prior designs and methods can be found in the literature Zerhari (); García‐Gil, Luengo, García, and Herrera (). On the other hand, k‐NN has been the seminal method to remove redundant and noisy instances in learning problems.…”
Section: The K‐nn Algorithm As a Tool To Transform Big Data Into Smarmentioning
confidence: 99%
See 1 more Smart Citation
“…As we have indicated in Section 4.2, noise filtering is a popular option in these cases, which becomes even more helpful in Big Data environments as noise filters reduce the size of the datasets. However, designing Big Data noise filters is a challenge and only some prior designs and methods can be found in the literature Zerhari (); García‐Gil, Luengo, García, and Herrera (). On the other hand, k‐NN has been the seminal method to remove redundant and noisy instances in learning problems.…”
Section: The K‐nn Algorithm As a Tool To Transform Big Data Into Smarmentioning
confidence: 99%
“…Other relevant examples of this family of methods are: All‐kNN Tomek (), NCN‐Edit Sánchez et al () or RNG Sánchez, Pla, and Ferri (). A distributed version of the ENN algorithm based on Apache Spark is proposed in García‐Gil et al () for very large datasets. This distributed version of ENN performs a global filtering of the instances, considering the whole dataset at once.…”
Section: The K‐nn Algorithm As a Tool To Transform Big Data Into Smarmentioning
confidence: 99%
“…As a classifier, we use Spark's MLlib implementation of a decision tree . We compare our method with the most recent and best‐performing proposal in the literature for noise‐cleaning in Big Data, homogeneous ensemble for Big Data (HME‐BD) . We show that, for some problems, the classifier benefits from the noise filtering even when no noise is added.…”
Section: Introductionmentioning
confidence: 97%
“…In other words, Smart Data refers to the challenge of transforming information into knowledge. Smart Data aims to separate the raw (or Big) part of the data (volume/velocity), from the Smart part of it (veracity/value) . Therefore, Smart Data is focused on extracting valuable knowledge from data, in the form of a subset, that contains enough quality for a successful data mining process …”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation