EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

Le, Hoang Lam; Landa-Silva, Dario; Galar, Mikel; García, Salvador; Triguero, Isaac

doi:10.1016/j.asoc.2020.107033

Cited by 25 publications

(7 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The data imbalance problem reduces the effectiveness of deep learning fault diagnosis [ 12 , 13 ]. To reduce the impact of imbalanced data on model performance of deep learning, data-level and algorithm-level approaches are proposed [ 14 , 15 , 16 , 17 , 18 ]. Among the data-level methods, oversampling and undersampling techniques are used to construct a balanced dataset [ 14 , 15 , 16 ].…”

Section: Introductionmentioning

confidence: 99%

“…To reduce the impact of imbalanced data on model performance of deep learning, data-level and algorithm-level approaches are proposed [ 14 , 15 , 16 , 17 , 18 ]. Among the data-level methods, oversampling and undersampling techniques are used to construct a balanced dataset [ 14 , 15 , 16 ]. However, oversampling can produce duplicate information, while undersampling can lead to a loss of information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Federated Learning Based Fault Diagnosis Driven by Intra-Client Imbalance Degree

Zhou

Yang

Wang

et al. 2023

Entropy

View full text Add to dashboard Cite

Federated learning is an effective means to combine model information from different clients to achieve joint optimization when the model of a single client is insufficient. In the case when there is an inter-client data imbalance, it is significant to design an imbalanced federation aggregation strategy to aggregate model information so that each client can benefit from the federation global model. However, the existing method has failed to achieve an efficient federation strategy in the case when there is an imbalance mode mismatch between clients. This paper aims to design a federated learning method guided by intra-client imbalance degree to ensure that each client can receive the maximum benefit from the federation model. The degree of intra-client imbalance, measured by gain of a class-by-class model update on the federation model based on a small balanced dataset, is used to guide the designing of federation strategy. An experimental validation for the benchmark dataset of rolling bearing shows that a 23.33% improvement of fault diagnosis accuracy can be achieved in the case when the degree of imbalance mode mismatch between clients is prominent.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Federated Learning Based Fault Diagnosis Driven by Intra-Client Imbalance Degree

Zhou

Yang

Wang

et al. 2023

Entropy

View full text Add to dashboard Cite

show abstract

“…The advantage of RUS is that it can quickly train the classification model, but it may eliminate useful data 15 . Another widely used undersampling method is based on clustering 16,17 . Cluster‐based methods preserve the data distribution characteristics while retaining the useful samples 16,18 .…”

Section: Introductionmentioning

confidence: 99%

“…15 Another widely used undersampling method is based on clustering. 16,17 Cluster-based methods preserve the data distribution characteristics F I G U R E 1 Data and its classification hyperplane while retaining the useful samples. 16,18 The cluster-based method divides the majority class into multiple groups, and then selects representative samples from every group.…”

Section: Introductionmentioning

confidence: 99%

Undersampling of approaching the classification boundary for imbalance problem

Jiang

Yuan

Liao

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

Using imbalanced data in classification affect the accuracy. If the classification is based on imbalanced data directly, the results will have large deviations. A common approach to dealing with imbalanced data is to re-structure the raw dataset via undersampling method. The undersampling method usually uses random or clustering approaches to trimming the majority class in the dataset, since some data in the majority class makes not contribute to classification model. In this paper a revised undersampling approach is proposed. First, we perform space compression in the vertical direction of the separating hyperplane. Then, a weighted random sampling hybrid ensemble learning method is carried out to make the sampled objects spread more widely near the separating hyperplane. Experiments with 7 under-sampling methods on 21 imbalanced datasets show that our method has achieved good results.

show abstract

“…Recently, the interest in using EAs to address machine learning problems is growing fastly [19]- [29]. For imbalanced learning, EAs have been used for data sampling [30], [31] and cost-sensitive learning [32]. Although recent studies address the problem of determining the optimal misclassification costs [32], [33], they have paid little attention to considering the hyper-parameters of the learning algorithm, along with exploiting the hierarchical nature of parameter and hyper-parameter learning to guide search.…”

Section: Introductionmentioning

confidence: 99%

Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization

Rosales-Pérez,

García,

Herrera

2022

Preprint

View full text Add to dashboard Cite

Support vector machines are popular learning algorithms to deal with binary classification problems. They traditionally assume equal misclassification costs for each class; however, real-world problems may have an uneven class distribution. This paper introduces EBCS-SVM: Evolutionary Bilevel Costsensitive Support Vector Machines. EBCS-SVM handles imbalanced classification problems by simultaneously learning the support vectors and optimizing the SVM hyper-parameters, which comprise the kernel parameter and misclassification costs. The resulting optimization problem is a bilevel problem, where the lower-level determines the support vectors and the upper-level the hyper-parameters. This optimization problem is solved using an evolutionary algorithm at the upper-level and Sequential Minimal Optimization at the lower-level. These two methods work in a nested fashion, i.e., the optimal support vectors help guide the search of the hyper-parameters, and the lower-level is initialized based on previous successful solutions. The proposed method is assessed using 70 datasets of imbalanced classification and compared with several state-of-the-art methods. The experimental results, supported by a Bayesian test, provided evidence of the effectiveness of EBCS-SVM when working with highly imbalanced datasets.

show abstract

EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

Cited by 25 publications

References 63 publications

Federated Learning Based Fault Diagnosis Driven by Intra-Client Imbalance Degree

Federated Learning Based Fault Diagnosis Driven by Intra-Client Imbalance Degree

Undersampling of approaching the classification boundary for imbalance problem

Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization

Contact Info

Product

Resources

About