Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods

Ijaz, Muhammad Fazal; Attique, Muhammad; Son, Youngdoo

doi:10.3390/s20102809

Cited by 198 publications

(136 citation statements)

References 70 publications

Supporting

Mentioning

136

Contrasting

Order By: Relevance

“…In Table 5, we compare the proposed approach with some recent scholarly works that used the cervical cancer dataset, including principal component analysis (PCA)-based SVM [33], a research work where the dataset was preprocessed and classified using numerous algorithms, in which LR and SVM had the best accuracy [34], a C5.0 decision tree [35]. The other methods include a multistage classification process which combined isolation forest (iForest), synthetic minority over-sampling technique (SMOTE), and RF [36], a sparse autoencoder feature learning method combined ANN classifier [12], and a feature selection method combined with C5.0 and RF [37]. [35] C5.0 96 Ijaz et al [36] iForest+SMOTE+RF 98.925 Mienye et al [12] SAE+ANN 98…”

Section: Methods Accuracy (%)mentioning

confidence: 99%

“…The other methods include a multistage classification process which combined isolation forest (iForest), synthetic minority over-sampling technique (SMOTE), and RF [36], a sparse autoencoder feature learning method combined ANN classifier [12], and a feature selection method combined with C5.0 and RF [37]. [35] C5.0 96 Ijaz et al [36] iForest+SMOTE+RF 98.925 Mienye et al [12] SAE+ANN 98…”

Section: Methods Accuracy (%)mentioning

confidence: 99%

See 1 more Smart Citation

Integrating Enhanced Sparse Autoencoder-Based Artificial Neural Network Technique and Softmax Regression for Medical Diagnosis

2020

View full text Add to dashboard Cite

In recent times, several machine learning models have been built to aid in the prediction of diverse diseases and to minimize diagnostic errors made by clinicians. However, since most medical datasets seem to be imbalanced, conventional machine learning algorithms tend to underperform when trained with such data, especially in the prediction of the minority class. To address this challenge and proffer a robust model for the prediction of diseases, this paper introduces an approach that comprises of feature learning and classification stages that integrate an enhanced sparse autoencoder (SAE) and Softmax regression, respectively. In the SAE network, sparsity is achieved by penalizing the weights of the network, unlike conventional SAEs that penalize the activations within the hidden layers. For the classification task, the Softmax classifier is further optimized to achieve excellent performance. Hence, the proposed approach has the advantage of effective feature learning and robust classification performance. When employed for the prediction of three diseases, the proposed method obtained test accuracies of 98%, 97%, and 91% for chronic kidney disease, cervical cancer, and heart disease, respectively, which shows superior performance compared to other machine learning algorithms. The proposed approach also achieves comparable performance with other methods available in the recent literature.

show abstract

Section: Methods Accuracy (%)mentioning

confidence: 99%

Section: Methods Accuracy (%)mentioning

confidence: 99%

Integrating Enhanced Sparse Autoencoder-Based Artificial Neural Network Technique and Softmax Regression for Medical Diagnosis

2020

View full text Add to dashboard Cite

show abstract

“…In [ 14 ], the authors provide survey of unsupervised machine learning algorithms that are proposed for outlier detection. In [ 15 ], the authors propose a cervical cancer prediction model (CCPM) for early prediction of cervical cancer using risk factors as inputs. The authors utilize several machine learning approaches and outlier detection for different preprocessing tasks.…”

Section: Background and Related Workmentioning

confidence: 99%

TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams

Huang

Zhong

Jaysawal

2020

Sensors

View full text Add to dashboard Cite

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.

show abstract

“…It deals with the noisy instances in the majority class via a noise filter based on the DBSCAN clustering algorithm [30] combined with the use of a minimum spanning tree (MST) algorithm to reduce the size of the negative class. The reason to combine the DBSCAN clustering with the MST approach is because DBSCAN has demonstrated to be a powerful tool for identifying and removing noisy instances and cleaning the overlapping between classes [31], but it does not produce a well-balanced class distribution. By viewing the data set as a weighted complete graph, the MST algorithm allows for discovering the core of the majority class, which is further used to remove the amount of redundant negative instances needed to balance both classes.…”

Section: Introductionmentioning

confidence: 99%

A New Under-Sampling Method to Face Class Overlap and Imbalance

et al. 2020

View full text Add to dashboard Cite

Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.

show abstract

Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods

Cited by 198 publications

References 70 publications

Integrating Enhanced Sparse Autoencoder-Based Artificial Neural Network Technique and Softmax Regression for Medical Diagnosis

Integrating Enhanced Sparse Autoencoder-Based Artificial Neural Network Technique and Softmax Regression for Medical Diagnosis

TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams

A New Under-Sampling Method to Face Class Overlap and Imbalance

Contact Info

Product

Resources

About