2012
DOI: 10.1080/18756891.2012.685292
|View full text |Cite
|
Sign up to set email alerts
|

Equalizing imbalanced imprecise datasets for genetic fuzzy classifiers

Abstract: Determining whether an imprecise dataset is imbalanced is not immediate. The vagueness in the data causes that the prior probabilities of the classes are not precisely known, and therefore the degree of imbalance can also be uncertain. In this paper we propose suitable extensions of different resampling algorithms that can be applied to interval valued, multi-labelled data. By means of these extended preprocessing algorithms, certain classification systems designed for minimizing the fraction of misclassificat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
14
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(15 citation statements)
references
References 40 publications
1
14
0
Order By: Relevance
“…Therefore, the uncertainty that needs to be managed does not refer only to the difficult identification of samples for each class but also to the values associated to the input values of the samples. In [18], several preprocessing techniques are adapted to the low quality data scenario to obtain a more or less balanced distribution that can be managed more easily. Specifically, low quality data versions of the ENN [87], NCL [88], CNN [89], SMOTE [72] and SMOTE+ENN [71] algorithms are designed to classify low quality imbalanced data using a genetic cooperative-competitive learning algorithm.…”
Section: Efs and Data-level Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, the uncertainty that needs to be managed does not refer only to the difficult identification of samples for each class but also to the values associated to the input values of the samples. In [18], several preprocessing techniques are adapted to the low quality data scenario to obtain a more or less balanced distribution that can be managed more easily. Specifically, low quality data versions of the ENN [87], NCL [88], CNN [89], SMOTE [72] and SMOTE+ENN [71] algorithms are designed to classify low quality imbalanced data using a genetic cooperative-competitive learning algorithm.…”
Section: Efs and Data-level Approachesmentioning
confidence: 99%
“…Specifically, linguistic fuzzy sets allow the smoothing of the borderline areas in the inference process, which is also a desirable behavior in the scenario of overlapping, which is known to highly degrade the performance in this context [15,16]. In accordance with the former, and with aims at improving the behaviour and performance of these systems, a wide number of approaches have been proposed in the field of EFS for addressing classification with imbalanced datasets [15,17,18].…”
Section: Introductionmentioning
confidence: 99%
“…In previous works, 18 three different categories of preprocessing algorithms for imbalanced problems and low quality data were proposed and their effect over Generic Cooperative-Competitive Learning was compared. In the next section, similar experiments will be carried to determine whether the algorithms that are discussed in the following serve the same purpose:…”
Section: Preprocessing Imbalanced Low Quality Datasetsmentioning
confidence: 99%
“…Some datasets with imprecise class labels are possibly imbalanced. 18 For instance, imagine a problem with three classes A, B and C where the ranges of the relative frequencies of the classes are f A ∈ [0.05, 0.25], f B ∈ [0.05, 0.35] and f C ∈ [0.3, 0.9]: the majority class can be either B or C, and the actual frequencies might be 0.25, 0.35 and 0.4 but it is also possible that they are 0.05, 0.05 and 0.9.…”
Section: Introductionmentioning
confidence: 99%
“…Oversampling, undersampling or combinations of both are used for rebalancing false positives and negatives. 3,5,44 Other authors 37 suggest that for every performance criteria, for example area under the ROC curve, 10,23 or arithmetic or geometric mean of the confusion matrix diagonal, 31 a cost matrix can be found for which the optimal classifier coincides with the minimum risk Bayes rule. However, the method for computing this cost matrix is still undefined.…”
Section: Introductionmentioning
confidence: 99%