A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification

Kang, Qi; Shi, Lei; Zhou, MengChu; Wang, Xuesong; Wu, Qidi; Wei, Zhi

doi:10.1109/tnnls.2017.2755595

Cited by 135 publications

(37 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The G-mean (denoted as GM) evaluates class-wise sensitivity and indicates the balanced classification performances on the majority and minority classes. The micro average scheme M AUC [15] is defined as the area under the curve metric. As for the task of object detection, we utilize the Average Precision (AP) (IoU=[.50:.05:.95]), AP .50 (IoU=.50), and AP .75 (IoU=.75) as performance evaluation metrics.…”

Section: Evaluation Metricsmentioning

confidence: 99%

IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition

Zhan

Lai

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

270

137

View full text Add to dashboard Cite

Insect pests are one of the main factors affecting agricultural product yield. Accurate recognition of insect pests facilitates timely preventive measures to avoid economic losses. However, the existing datasets for the visual classification task mainly focus on common objects, e.g., flowers and dogs. This limits the application of powerful deep learning technology on specific domains like the agricultural field. In this paper, we collect a large-scale dataset named IP102 for insect pest recognition. Specifically, it contains more than 75, 000 images belonging to 102 categories, which exhibit a natural long-tailed distribution. In addition, we annotate about 19, 000 images with bounding boxes for object detection. The IP102 has a hierarchical taxonomy and the insect pests which mainly affect one specific agricultural product are grouped into the same upperlevel category. Furthermore, we perform several baseline experiments on the IP102 dataset, including handcrafted and deep feature based classification methods. Experimental results show that this dataset has the challenges of interand intra-class variance and data imbalance. We believe our IP102 will facilitate future research on practical insect pest control, fine-grained visual classification, and imbalanced learning fields. We make the dataset and pre-trained models publicly available at https://github.com/ xpwu95/IP102.

show abstract

Section: Evaluation Metricsmentioning

confidence: 99%

IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition

Zhan

Lai

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

270

137

View full text Add to dashboard Cite

show abstract

“…1) Sampling-Based Methods: Sampling-based methods attempt to handle the class imbalance problem at the data level, i.e., improving the data preprocessing technique. Specifically, these methods aim to balance the distribution of the original training set by over-sampling the minority classes [40]- [43], under-sampling the majority classes [7], [44], [45], or both.…”

Section: A Class Imbalancementioning

confidence: 99%

“…As for the area under the curve (AUC) metric in the classification problem, we follow the micro average scheme M AUC of the definition as in [7]. Similar to the form of F -measure and G-mean, it integrates the weighted average of all labels:…”

Section: B Experimental Settings 1) Training/testing Set Partitionmentioning

confidence: 99%

“…However, most existing works on class imbalance problems focus only on the imbalanced distribution of sample numbers among different classes [7]- [9]. Such distribution indicates a large gap in the training numbers among categories [10]- [13], where there mainly exists three types of solutions, i.e., the sampling-based [14]- [16], the cost-sensitive based [17]- [19], and the ensemble-based methods [20], [20], [21].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Self-Paced Balance Learning for Clinical Skin Disease Recognition

Yang

Liang

et al. 2020

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Class imbalance is a challenging problem in many classification tasks. It induces biased classification results for minority classes which contain less training samples than others. Most existing approaches aim to remedy the imbalanced number of instances among categories by re-sampling the majority and minority classes accordingly. However, the imbalanced level of difficulty of recognizing different categories is also crucial, especially for distinguishing samples with many classes. For example, in the task of clinical skin disease recognition, several rare diseases have a small number of training samples, but they are easy to diagnose because of their distinct visual properties. On the other hand, some common skin diseases, e.g., eczema, are hard to recognize due to the lack of special symptoms. To address this problem, we propose a self-paced balance learning (SPBL) algorithm in this paper. Specifically, we introduce a comprehensive metric termed the complexity of image category which is a combination of both sample number and recognition difficulty. First, the complexity is initialized using the model of the first pace, where the pace indicates one iteration in the selfpaced learning paradigm. We then assign each class a penalty weight which is larger for more complex categories and smaller for easier ones, after which the curriculum is reconstructed by rearranging the training samples. Consequently, the model can iteratively learn discriminative representations via balancing the complexity in each pace. Experimental results on the SD-198 and SD-260 benchmark datasets demonstrate that the proposed SPBL algorithm performs favorably against the state-of-the-art methods. We also demonstrate the effectiveness of the SPBL algorithm's generalization capacity on various tasks such as indoor scene image recognition, object classification, etc.

show abstract

“…The algorithms based on frequency can be realized at a lower computation cost by reducing the dimension of the frequency vectors [56]. SVM is sensitive to noises and outliers [57][58][59][60]. FSVM is based on fuzzy theory to reduce the influence of noises or outliers on the classification hyperplane [61][62][63][64][65][66].…”

Section: Previous Workmentioning

confidence: 99%

A New Method of Fuzzy Support Vector Machine Algorithm for Intrusion Detection

Liu

2020

Applied Sciences

View full text Add to dashboard Cite

Since SVM is sensitive to noises and outliers of system call sequence data. A new fuzzy support vector machine algorithm based on SVDD is presented in this paper. In our algorithm, the noises and outliers are identified by a hypersphere with minimum volume while containing the maximum of the samples. The definition of fuzzy membership is considered by not only the relation between a sample and hyperplane, but also relation between samples. For each sample inside the hypersphere, the fuzzy membership function is a linear function of the distance between the sample and the hyperplane. The greater the distance, the greater the weight coefficient. For each sample outside the hypersphere, the membership function is an exponential function of the distance between the sample and the hyperplane. The greater the distance, the smaller the weight coefficient. Compared with the traditional fuzzy membership definition based on the relation between a sample and its cluster center, our method effectively distinguishes the noises or outlies from support vectors and assigns them appropriate weight coefficients even though they are distributed on the boundary between the positive and the negative classes. The experiments show that the fuzzy support vector proposed in this paper is more robust than the support vector machine and fuzzy support vector machines based on the distance of a sample and its cluster center.

show abstract

A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification

Cited by 135 publications

References 32 publications

IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition

IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition

Self-Paced Balance Learning for Clinical Skin Disease Recognition

A New Method of Fuzzy Support Vector Machine Algorithm for Intrusion Detection

Contact Info

Product

Resources

About