A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data

Xu, Zhaozhao; Shen, Derong; Nie, Tiezheng; Kou, Yue; Yin, Na; Han, Xiaoxu

doi:10.1016/j.ins.2021.02.056

Cited by 113 publications

(31 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The SMOTEC algorithm [ 11 ] first uses the modified SMOTE (synthetic minority oversampling technique) method to oversample a small number of class instances in the training dataset to increase the number of minority class samples. Then it uses the SVM feature to design a clustering algorithm to clean the data set after oversampling.…”

Section: Proposed Methodsmentioning

confidence: 99%

Analysis of Risk Factors of Neurobiological Pipeline Care and Investigation of Preventive Measures

Sun

Wang

2021

Journal of Healthcare Engineering

View full text Add to dashboard Cite

During clinical care, most neurosurgical patients are critically ill. They have sudden onset of illness that should be treated on time with proper care. The patients require continuous hospitalization for proper treatment. The recovery of patients may be relatively slow and takes some time. Patients and Methods. To explore where the risks of pipeline care lie and the preventive measures. (1) In this paper, 100 neurosurgical patients were treated in our hospital from September 2018 to March 2020. They were firstly selected and divided into two groups. Group A was implemented with routine pipeline care and group B was implemented with the intervention developed by the pipeline team. (2) The design and SMOTE assume that, during the generation of a new synthetic sample of minority classes, the immediate neighbors of the minority class instances were also all minority classes, regardless of their true distribution characteristics, to analyze risk factors during care and summarize preventive measures. Results. The experimental results showed that the total efficiency of nursing care was higher in group B as compared to group A, P < 0.05 ; also, the number of pipeline accidents was lower in group B. Conclusion It is important to be meticulous and thoughtful in pipeline care and to comprehensively analyze the possible risk events and then propose preventive measures so that risk events can be reduced.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Analysis of Risk Factors of Neurobiological Pipeline Care and Investigation of Preventive Measures

Sun

Wang

2021

Journal of Healthcare Engineering

View full text Add to dashboard Cite

show abstract

“…The weights were adaptively determined according to the number of majority data around minority ones. Xu et al 32 introduced K ‐means clustering to SMOTE, with the purpose of distinguishing noise, overlapping, and boundary samples. Krawczyk et al 33 proposed multiclass radial‐based oversampling to learn the data distribution of minority samples by radial basis function (RBF).…”

Section: Related Workmentioning

confidence: 99%

A transfer weighted extreme learning machine for imbalanced classification

Guo

Jiao

Tan

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

Previous class imbalance learning methods are mostly grounded on the assumption that all training data have been labeled, however, is impractical in many real-world applications. The limited amount of labeled instances may produce a classifier with poor generalization. To address the issue, a transfer weighted extreme learning machine (TWELM) classifier is proposed, with the purpose of extracting knowledge from other domains to improve the classification performance of a classifier in a limited labeled target domain. To be specific, a well-tuned weighted extreme learning machine classifier is first learned from source data that has been completely labeled. Subsequently, another extreme learning machine classifier is obtained from the limited labeled target domain data to preserve the target domain structural knowledge and the decision boundary information. Finally, the target classifier

show abstract

“…Traditional AI classifiers are vulnerable in learning highly skewed data as they are designed to expect different classes to contribute equally to minimization of the classifiers' loss functions [34]. One of the well-known oversampling techniques called Synthetic Minority Over-sampling Technique (SMOTE) was proposed in [35] which formed the basis of many other oversampling techniques for classification such as hybrid K-means SMOTE [36] and SMOTE combined with self-organizing maps [34]. The main reason which hinders the application of oversampling for regression problems, is the determination of the target output of the oversampled datapoints.…”

Section: ) Refinement Of the Ann Architecturementioning

confidence: 99%

Harmonic Current Estimation of Unmonitored Harmonic Sources With a Novel Oversampling Technique for Limited Datasets

et al. 2022

View full text Add to dashboard Cite

In modern power systems, harmonics are amongst the significant issues attributed to renewable energy sources and nonlinear loads. Direct harmonic monitoring of the entire power system may be too costly or impractical, and measured data could be limited. In this paper, a new methodology is proposed to estimate harmonic current rms values of unmonitored harmonic sources, based on harmonic voltage rms magnitudes only, measured at a limited number of monitored buses. A new technique of output curve-normalization is employed in pre-processing. Subsequently, a method is proposed to refine the architecture of the Artificial Neural Networks' (ANNs), after which ANN-based harmonic current estimators are developed for each harmonic order and each harmonic source. Furthermore, a novel Neural Oversampling Consensus Algorithm for Regression (NOCAR) is proposed to improve estimation accuracy. K-Nearest Neighbor (KNN) and ANN are combined in developing NOCAR. A comparison is made with state-of-the-art techniques by using synthetic data, which demonstrates both the proposed method's robustness and its capability to perform when minimal information is available. The implementation for real data demonstrates the efficiency of the ANNbased harmonic current estimators with oversampling. The influence of the number of harmonic meters is investigated, revealing the ability of this data-driven technique to reduce the number of harmonic meters, and hence monitoring costs. Moreover, the correlation between different harmonic orders is studied, with results suggesting that, unlike the widely accepted notion, this correlation should not be ignored in harmonic analysis. This study highlights the advantages of integrating intelligent techniques into harmonic monitoring systems. INDEX TERMSArtificial neural networks, harmonic current estimation, harmonic voltage monitors, oversampling. NOMENCLATURE b k+1 j Bias of node j in Layer k+1. D Dataset for ANN training and testing. d k+1 j

show abstract

A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data

Cited by 113 publications

References 33 publications

Analysis of Risk Factors of Neurobiological Pipeline Care and Investigation of Preventive Measures

Analysis of Risk Factors of Neurobiological Pipeline Care and Investigation of Preventive Measures

A transfer weighted extreme learning machine for imbalanced classification

Harmonic Current Estimation of Unmonitored Harmonic Sources With a Novel Oversampling Technique for Limited Datasets

Contact Info

Product

Resources

About