Comparative Analysis of KNN Algorithm using Various Normalization Techniques

Pandey, Amit Kumar; Jain, Achin

doi:10.5815/ijcnis.2017.11.04

Cited by 104 publications

(57 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The test results in this study inform that the z-score normalization method has a stable accuracy between 95% to 97%. The accuracy value of the z-score method found in this study is higher than the results of research conducted by Pandey and Jain (2017) [5] on the IRIS data set, and Nasution et al (2019) [6] regarding the wine data set.…”

Section: Resultscontrasting

confidence: 87%

Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer

Henderi¹

2021

IJIIS : Int. J. Inform. Inform. Systems.

126

View full text Add to dashboard Cite

The purpose of this study was to examine the results of the prediction of breast cancer, which have been classified based on two types of breast cancer, malignant and benign. The method used in this research is the k-NN algorithm with normalization of min-max and Z-score, the programming language used is the R language. The conclusion is that the highest k accuracy value is k = 5 and k = 21 with an accuracy rate of 98% in the normalization method using the min-max method. Whereas for the Z-score method the highest accuracy is at k = 5 and k = 15 with an accuracy rate of 97%. Thus the min-max normalization method in this study is considered better than the normalization method using the Z-score. The novelty of this research lies in the comparison between the two min-max normalizations and the Z-score normalization in the k-NN algorithm.

show abstract

Section: Resultscontrasting

confidence: 87%

Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer

Henderi¹

2021

IJIIS : Int. J. Inform. Inform. Systems.

126

View full text Add to dashboard Cite

show abstract

“…Hasil akurasi klasifikasi yang diperoleh peneliti dengan menggunakan 5-fold cross validation sebesar 98,35%. Selanjutnya pada tahun 2017 Amit pandey [7]…”

Section: Pendahuluanunclassified

Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi

Suryanegara

Adiwijaya

Purbolaksono

2021

RESTI

View full text Add to dashboard Cite

Diabetes is a disease caused by high blood sugar in the body or beyond normal limits. Diabetics in Indonesia have experienced a significant increase, Basic Health Research states that diabetics in Indonesia were 6.9% to 8.5% increased from 2013 to 2018 with an estimated number of sufferers more than 16 million people. Therefore, it is necessary to have a technology that can detect diabetes with good performance, accurate level of analysis, so that diabetes can be treated early to reduce the number of sufferers, disabilities, and deaths. The different scale values for each attribute in Gula Karya Medika’s data can complicate the classification process, for this reason the researcher uses two data normalization methods, namely min-max normalization, z-score normalization, and a method without data normalization with Random Forest (RF) as a classification method. Random Forest (RF) as a classification method has been tested in several previous studies. Moreover, this method is able to produce good performance with high accuracy. Based on the research results, the best accuracy is model 1 (Min-max normalization-RF) of 95.45%, followed by model 2 (Z-score normalization-RF) of 95%, and model 3 (without data normalization-RF) of 92%. From these results, it can be concluded that model 1 (Min-max normalization-RF) is better than the other two data normalization models and is able to increase the performance of classification Random Forest by 95.45%.

show abstract

“…The dataset was split into train and test sets using a 10-Fold approach. Biomarker features were imputed on the test set using the mean value of the K most similar patients from the real biomarker data of the train set using the KNN algorithm [24]. The value of K is determined by the amount of available data.…”

Section: Datasets and Pre-processmentioning

confidence: 99%

Prognosis Patients with COVID-19 using Deep Learning

Alvare

Hussain

Flores

et al. 2021

Preprint

View full text Add to dashboard Cite

Background: Prognostics study the prediction of an event before it happens, to enable critical decision making to be more efficient. The prognostics are very useful for front line physicians to predict how a disease may affect a patient and react accordingly to save the patients’ lives. The coronavirus (COVID-19) is novel and not enough knowledge about the virus’ behaviour and Key performance indicators (KPIs) to assess the mortality risk prediction. However, using a lot of complex and expensive medical biomarkers could be impossible for many low-budget hospitals. This motivates the development of a prediction model that not only maximizes performance but does so using the least number of biomarkers possible. Methods: For the mortality risk prediction, this research work proposes aCOVID-19 mortality risk calculator based on a Deep Learning (DL) model, and based on a data set provided by the HM Hospitals from Madrid, Spain. A pre-processing strategy for unbalanced classes and feature selection is proposed. Results: The DL model is tested, and the following results are achieved include area under the curve (AUC) 0.93, F2 score 0.93, recall 1.00, accuracy, 0.95, precision 0.91, specificity 0.9279 and maximum probability of correct decision(MPCD) 0.93. Conclusion: The MPCD score shows that the proposed DL outperforms on the everyday set when evaluating even with an over-sampling technique. The benefits of imputating unavailable biomarker data are also evaluated. The results are compared against a random forest (RF) algorithm and the newly proposed methods. The results show that the proposed method is significantly best for the risk prediction of the patients with COVID-19.

show abstract

Comparative Analysis of KNN Algorithm using Various Normalization Techniques

Cited by 104 publications

References 12 publications

Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer

Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer

Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi

Prognosis Patients with COVID-19 using Deep Learning

Contact Info

Product

Resources

About