Abstract:Abstract-Classification is the technique of identifying and assigning individual quantities to a group or a set. In pattern recognition, K-Nearest Neighbors algorithm is a non-parametric method for classification and regression. The K-Nearest Neighbor (kNN) technique has been widely used in data mining and machine learning because it is simple yet very useful with distinguished performance. Classification is used to predict the labels of test data points after training sample data. Over the past few decades, r… Show more
“…The test results in this study inform that the z-score normalization method has a stable accuracy between 95% to 97%. The accuracy value of the z-score method found in this study is higher than the results of research conducted by Pandey and Jain (2017) [5] on the IRIS data set, and Nasution et al (2019) [6] regarding the wine data set.…”
The purpose of this study was to examine the results of the prediction of breast cancer, which have been classified based on two types of breast cancer, malignant and benign. The method used in this research is the k-NN algorithm with normalization of min-max and Z-score, the programming language used is the R language. The conclusion is that the highest k accuracy value is k = 5 and k = 21 with an accuracy rate of 98% in the normalization method using the min-max method. Whereas for the Z-score method the highest accuracy is at k = 5 and k = 15 with an accuracy rate of 97%. Thus the min-max normalization method in this study is considered better than the normalization method using the Z-score. The novelty of this research lies in the comparison between the two min-max normalizations and the Z-score normalization in the k-NN algorithm.
“…The test results in this study inform that the z-score normalization method has a stable accuracy between 95% to 97%. The accuracy value of the z-score method found in this study is higher than the results of research conducted by Pandey and Jain (2017) [5] on the IRIS data set, and Nasution et al (2019) [6] regarding the wine data set.…”
The purpose of this study was to examine the results of the prediction of breast cancer, which have been classified based on two types of breast cancer, malignant and benign. The method used in this research is the k-NN algorithm with normalization of min-max and Z-score, the programming language used is the R language. The conclusion is that the highest k accuracy value is k = 5 and k = 21 with an accuracy rate of 98% in the normalization method using the min-max method. Whereas for the Z-score method the highest accuracy is at k = 5 and k = 15 with an accuracy rate of 97%. Thus the min-max normalization method in this study is considered better than the normalization method using the Z-score. The novelty of this research lies in the comparison between the two min-max normalizations and the Z-score normalization in the k-NN algorithm.
“…Hasil akurasi klasifikasi yang diperoleh peneliti dengan menggunakan 5-fold cross validation sebesar 98,35%. Selanjutnya pada tahun 2017 Amit pandey [7]…”
Diabetes is a disease caused by high blood sugar in the body or beyond normal limits. Diabetics in Indonesia have experienced a significant increase, Basic Health Research states that diabetics in Indonesia were 6.9% to 8.5% increased from 2013 to 2018 with an estimated number of sufferers more than 16 million people. Therefore, it is necessary to have a technology that can detect diabetes with good performance, accurate level of analysis, so that diabetes can be treated early to reduce the number of sufferers, disabilities, and deaths. The different scale values for each attribute in Gula Karya Medika’s data can complicate the classification process, for this reason the researcher uses two data normalization methods, namely min-max normalization, z-score normalization, and a method without data normalization with Random Forest (RF) as a classification method. Random Forest (RF) as a classification method has been tested in several previous studies. Moreover, this method is able to produce good performance with high accuracy. Based on the research results, the best accuracy is model 1 (Min-max normalization-RF) of 95.45%, followed by model 2 (Z-score normalization-RF) of 95%, and model 3 (without data normalization-RF) of 92%. From these results, it can be concluded that model 1 (Min-max normalization-RF) is better than the other two data normalization models and is able to increase the performance of classification Random Forest by 95.45%.
“…The dataset was split into train and test sets using a 10-Fold approach. Biomarker features were imputed on the test set using the mean value of the K most similar patients from the real biomarker data of the train set using the KNN algorithm [24]. The value of K is determined by the amount of available data.…”
Background: Prognostics study the prediction of an event before it happens, to enable critical decision making to be more efficient. The prognostics are very useful for front line physicians to predict how a disease may affect a patient and react accordingly to save the patients’ lives. The coronavirus (COVID-19) is novel and not enough knowledge about the virus’ behaviour and Key performance indicators (KPIs) to assess the mortality risk prediction. However, using a lot of complex and expensive medical biomarkers could be impossible for many low-budget hospitals. This motivates the development of a prediction model that not only maximizes performance but does so using the least number of biomarkers possible. Methods: For the mortality risk prediction, this research work proposes aCOVID-19 mortality risk calculator based on a Deep Learning (DL) model, and based on a data set provided by the HM Hospitals from Madrid, Spain. A pre-processing strategy for unbalanced classes and feature selection is proposed. Results: The DL model is tested, and the following results are achieved include area under the curve (AUC) 0.93, F2 score 0.93, recall 1.00, accuracy, 0.95, precision 0.91, specificity 0.9279 and maximum probability of correct decision(MPCD) 0.93. Conclusion: The MPCD score shows that the proposed DL outperforms on the everyday set when evaluating even with an over-sampling technique. The benefits of imputating unavailable biomarker data are also evaluated. The results are compared against a random forest (RF) algorithm and the newly proposed methods. The results show that the proposed method is significantly best for the risk prediction of the patients with COVID-19.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.