Data mining is a method that can classify data into different classes based on the features in the data. With data mining, non-performance loan categories can be classified based on data on lending from cooperatives to their members. This study uses K-Nearest Neighbor to classify non-performance loan categories with various distance metric variations such as Chebyshev, Euclidean, Mahalanobis, and Manhattan. The evaluation results using 10-fold cross-validation show that the Euclidean distance has the highest accuracy, precision, F1, and sensitivity values compared to other distance metrics. Chebyshev distance has the lowest accuracy, precision, sensitivity, while Mahalanobis distance has the lowest F1 value. Euclidean and Manhattan distances have the highest reliability values for true-positive and true-negative class classifications. Mahalanobis distance has the lowest reliability value for false-positive class classification, while Chebyshev distance has the lowest value for false-negative class classification
Non-performing loan (NPL) is a risk that credit unions must face and to avoid that, prospective debtors need to be surveyed. With previous loan data, support vector machine and naïve bayes can be used as classification methods to give a decision about NPL. We use a data set with 61 data and process the data with orange 3.30 application to see the difference between SVM using linear (SVM-L), polynomial (SVM-P), RBF (SVM-R) and sigmoid (SVM-S) kernel with naïve bayes. We use a cross validation technique with various folds to measure the classification results and a convusion matrix to measure the data training classification results. Naïve bayes scores the highest in terms of accuracy and SVM-R scores the highest in terms of F1, precision and recall. SVM-P scores the lowest in terms of accuracy, F1, precision and recall. Naïve bayes scores the highest in terms of proportion of predicted for true negative class and proportion of actual for true positive class. SVM-S scores the highest in terms of proportion of predicted for true positive class and proportion of actual for true negative class. SVM-P scores the lowest in both proportion of predicted and proportion of actual. Keywords: classification; naïve bayes; non-performing loan; support vector machine Abstrak: Kredit macet merupakan resiko yang sering dialami koperasi simpan pinjam, sehingga perlu dilakukan survei terhadap calon debitur agar kredit menjadi sehat. Dengan menggunakan data pemberian kredit sebelumnya, support vector machine dan naïve bayes digunakan sebagai metode klasifikasi untuk memberikan keputusan macet atau tidaknya kredit anggota koperasi Mutiara Sejahtera. Data set yang berjumlah 61 data diolah menggunakan aplikasi Orange 3.30 dan dilihat perbandingan antara metode SVM dengan kernel linear, polynomial, RBF dan sigomoid dengan metode naïve bayes. Cross validation dengan jumlah fold bervariasi digunakan sebagai nilai ukur klasifikasi dan convusion matrix digunakan sebagai nilai ukur klasifikasi data training. Hasil yang diperoleh adalah naïve bayes memiliki nilai accuracy tertinggi dan SVM kernel RBF memiliki nilai F1, precision dan recall tertinggi. SVM kernel polynomial memiliki nilai terendah untuk accuracy, F1, precision dan recall. Naïve bayes memiliki nilai tertinggi untuk proportion of predicted (PoP) kelas true negative dan proportion of actual (PoA) kelas true positive. SVM kernel sigmoid memiliki nilai tertinggi untuk PoP kelas true positive dan PoA kelas true negative. SVM kernel polynomial memiliki nilai terendah baik untuk PoP maupun PoA true negative dan kelas true positive. Kata kunci: klasifikasi; kredit macet; naive bayes; SVM
In data mining, clustering is an unsupervised learning technique often used to group data by similarity. Clustering, especially the K-means clustering algorithm, is a feasible tool for expanding a dataset label by increasing the cluster's number according to the label's categories. This research extends the credit loan label data set from two categories (non-performing and performing loans) to four risk levels (high risk, medium risk, low risk, and no risk). The combination of three K-nearest neighbor’s distance metrics, Euclidean, Manhattan, and Chebyshev distance, with four different K values (K = 3, K = 5, K = 7, and K = 9) produced the best model with accuracy, precision, and recall values of 90%, 90.53571%, and 90%, from the model using the Euclidean distance with K = 9.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.