2020
DOI: 10.1007/978-3-030-47436-2_6
|View full text |Cite
|
Sign up to set email alerts
|

A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data

Abstract: In k Nearest Neighbor (kNN) classifier, a query instance is classified based on the most frequent class of its nearest neighbors among the training instances. In imbalanced datasets, kNN becomes biased towards the majority instances of the training space. To solve this problem, we propose a method called Proximity weighted Evidential kNN classifier. In this method, each neighbor of a query instance is considered as a piece of evidence from which we calculate the probability of class label given feature values … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 17 publications
(22 reference statements)
0
8
0
Order By: Relevance
“…Selection of the informative features in the MGS step was implemented using MATLAB and ranking these informative features in MGS f and MGS rf step was implemented using Python with the package scikit-learn [ 41 ]. To evaluate the performance of the proposed and existing methods, different classifiers such as SVM, RF classifiers, XGboost [ 42 ], PE k NN [ 43 ] can be used. In this paper, we only use two simple classifiers namely SVM (linear kernel) and Random Forest to compare different methods.…”
Section: Methodsmentioning
confidence: 99%
“…Selection of the informative features in the MGS step was implemented using MATLAB and ranking these informative features in MGS f and MGS rf step was implemented using Python with the package scikit-learn [ 41 ]. To evaluate the performance of the proposed and existing methods, different classifiers such as SVM, RF classifiers, XGboost [ 42 ], PE k NN [ 43 ] can be used. In this paper, we only use two simple classifiers namely SVM (linear kernel) and Random Forest to compare different methods.…”
Section: Methodsmentioning
confidence: 99%
“…Many references confirm that the performance of KNN is affected by data imbalance. 21,24,33 Conceptually the KNN algorithm calculates the (Euclidian) distances of the training data set observations from a validation sample (i.e., to be labeled) and assigns the majority label of KNNs of the validation sample to it. Therefore, if the observations of each class (in the feature space of the data set) have little distance from each other and a small parameter K is selected, the performance of this algorithm will not be affected by data imbalance.…”
Section: Knn Parameter Selectionmentioning
confidence: 99%
“…Although KNN is a simple and accurate algorithm it has some weaknesses such as being biased toward majority observations when facing data imbalance. 21 Several types of research have been performed to improve KNN performance against data imbalance by using oversampling, 22 boosting-byresample strategy, 23 misclassification cost, 24 and so forth. This algorithm has been used for fault detection in a wide area of subjects such as power systems, 25,26 railway point systems, 27 nuclear power plants, 28 and especially WTs, 20,[29][30][31] however, the focus of previous FDI works is not mainly on data imbalance challenge.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Although the kNN algorithm is a versatile technique for classification tasks, it has some drawbacks, such as determining a secure way of choosing the k parameter, being sensitive to the similarity (distance) function used (Kotsiantis, Zaharakis & Pintelas, 2006), and a large amount of storage for large datasets (Harrington, 2012). As the kNN considers the most frequent class of its nearest neighbors, it is intuitive to conclude that for imbalanced datasets, the method will bias the results towards the majority class in the training dataset (Kadir et al, 2020).…”
Section: K-nearest Neighborsmentioning
confidence: 99%