A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events’ Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection

Zhao, Yang; Wong, Zoie Shui-Yee; Tsui, Kwok Leung

doi:10.1155/2018/6275435

Cited by 46 publications

(28 citation statements)

References 34 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This assumption leads to a tendency to favor the majority class when applied to an imbalanced dataset, which can decrease the accuracy in classifying minority occurrences. 37 Because this is a frequently encountered scenario, prior groups have studied various rebalancing strategies. 37,38 Previous analysis has shown an improvement in prediction accuracy when applying rebalancing methods, as long as the dataset is sufficient in size.…”

Section: Discussionmentioning

confidence: 99%

Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data

Simon

Mandair

Tiwari

et al. 2021

J Cardiovasc Pharmacol Ther

View full text Add to dashboard Cite

Background: Drug-induced QT prolongation is a potentially preventable cause of morbidity and mortality, however there are no widespread clinical tools utilized to predict which individuals are at greatest risk. Machine learning (ML) algorithms may provide a method for identifying these individuals, and could be automated to directly alert providers in real time. Objective: This study applies ML techniques to electronic health record (EHR) data to identify an integrated risk-prediction model that can be deployed to predict risk of drug-induced QT prolongation. Methods: We examined harmonized data from the UCHealth EHR and identified inpatients who had received a medication known to prolong the QT interval. Using a binary outcome of the development of a QTc interval >500 ms within 24 hours of medication initiation or no ECG with a QTc interval >500 ms, we compared multiple machine learning methods by classification accuracy and performed calibration and rescaling of the final model. Results: We identified 35,639 inpatients who received a known QT-prolonging medication and an ECG performed within 24 hours of administration. Of those, 4,558 patients developed a QTc > 500 ms and 31,081 patients did not. A deep neural network with random oversampling of controls was found to provide superior classification accuracy (F1 score 0.404; AUC 0.71) for the development of a long QT interval compared with other methods. The optimal cutpoint for prediction was determined and was reasonably accurate (sensitivity 71%; specificity 73%). Conclusions: We found that deep neural networks applied to EHR data provide reasonable prediction of which individuals are most susceptible to drug-induced QT prolongation. Future studies are needed to validate this model in novel EHRs and within the physician order entry system to assess the ability to improve patient safety.

show abstract

Section: Discussionmentioning

confidence: 99%

Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data

Simon

Mandair

Tiwari

et al. 2021

J Cardiovasc Pharmacol Ther

View full text Add to dashboard Cite

show abstract

“…Many problems in the study to detect a disease, more prioritizing the measurement in the case of recall [14]. In this study, recall is important because high recall means that the CNN Model made has a slight error rate in detecting a person who affected by pneumonia or tuberculosis.…”

Section: Model Evaluation Of Undersampling and Oversampling Datasetmentioning

confidence: 99%

Classification of Tuberculosis and Pneumonia in Human Lung Based on Chest X-Ray Image using Convolutional Neural Network

Liebenlito

Irene

Hamid

2020

InPrime:Ind.Jour.Pure.Applied.Math

View full text Add to dashboard Cite

In this paper, we use chest x-ray images of Tuberculosis and Pneumonia to diagnose the patient using a convolutional neural network model. We use 4273 images of pneumonia, 1989 images of normal, and 394 images of tuberculosis. The data are divided into 80% as the training set and 20% as the testing set. We do the preprocessing steps to all of our images data, such as resize, converting RGB to grayscale, and Gaussian normalization. On the training dataset, the sampling technique used is undersampling and oversampling to balance each class. The best model was chosen based on the Area under Curve value i.e. the area under the curve of Receiver Operating Characteristics. This method shows that the best model obtains when trains the training dataset using oversampling. The Area under Curve value is 0.99 for tuberculosis and 0.98 for pneumonia. Therefore, this best model succeeds to identify 86% true for tuberculosis and 96% true for pneumonia.Keywords: chest X-ray images; tuberculosis; pneumonia; convolutional neural network. AbstrakPada penelitian ini memanfaatkan data citra chest x-ray penderita penyakit tuberculosis dan pneumonia. Model convolutional neural network digunakan untuk membantu mendiagnosis kedua penyakit ini. Data yang digunakan masing-masing sudah dilabeli sebanyak 4273 citra pneumonia, 1989 citra normal dan 394 citra tuberculosis. Data tersebut dibagi menjadi 80% himpunan data latih dan 20% data uji. Himpunan data tersebut telah melalui 3 tahap prepocessing yaitu resize citra, merubah citra RGB menjadi grayscale dan standarisasi gausian pada citra. Pada data latih dilakukan teknik sampling berupa undersampling dan oversampling data untuk menyeimbangkan data latih antar kelas. Model terbaik dipilih berdasarkan nilai Area under Curve yaitu luas daerah di bawah kurva Receiver Operating Chracteristics. Hasil menunjukkan bahwa model terbaik dihasilkan ketika dilatih menggunakan data latih hasil oversampling dengan nilai Area under Curve kelas tuberculosis sebesar 0,99 dan nilai Area under Curve kelas pneumonia sebesar 0,98. Oleh karena itu, model terbaik ini mampu mengindentifikasi sebanyak 86% penyakit tuberculosis dan 96% penyakit pneumonia.Kata Kunci: citra chest X-ray; penyakit infeksi paru; pengolahan citra digital Convolutional Neural Network.

show abstract

“…In this case, classification accuracy (A) can mislead to select the best performing model. Techniques to select the best model for data with class imbalance are: Choosing the performance metrics those that focus on the minority class, oversampling the minority class using SMOTE to rebalance the class, undersampling the majority class to rebalance the class and selecting classification algorithms such as those that penalize misclassification errors differently [ Zhao et al,( 2018)]. The classification algorithms such as LR, SVM, MLP and K-NN are used for creating classification model.…”

Section: Fig 2 Class Distributionmentioning

confidence: 99%

Untitled

2020

INDJCSE

View full text Add to dashboard Cite

The objective of this paper is to build a CKD prediction model using machine learning techniques that can predict the risk of chronic kidney disease (CKD) in patients with Cardiovascular Disease (CVD) or at high risk of CVD. CVD is associated with worsening of renal functions. But patients with CVD remains often underdiagnosed and undertreated for CKD because mostly the clinical diagnosis and treatment are single organ centered in earlier stages. Machine learning algorithms have been widely used to predict and classify diseases in healthcare. Healthcare data is often imbalanced. In this analysis, the CKD prediction model is built using CVD data with imbalanced distribution of positive and negative cases. The analysis involves three stages: Stage I involves selecting the best model based on performance metrics that support imbalanced class distribution without applying any resampling techniques. Stage II involves oversampling the training data of the minority class using Synthetic Minority Oversampling Technique (SMOTE) and stage III involves randomly under-sampling the training data of the majority class to solve the class imbalance. The experimental results show that the MLP (Multi-Layer Perceptron)-SMOTE model performs better in predicting CKD with a better F-score, recall, precision, G-mean, balanced accuracy and RUC-AUC when compared to other models.

show abstract

A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events’ Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection

Cited by 46 publications

References 34 publications

Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data

Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data

Classification of Tuberculosis and Pneumonia in Human Lung Based on Chest X-Ray Image using Convolutional Neural Network

Untitled

Contact Info

Product

Resources

About