Jingjun Bi scite author profile

In predictive tasks, real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (the head or the most frequent) classes have sufficient samples, the minority (the tail or the less frequent or rare) classes can be under-represented by a rather limited number of samples. Data pre-processing has been shown to be very effective in dealing with such problems. On one hand, data re-sampling is a common approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a conventional technique for reducing noise and inconsistencies in a dataset. However, the possible synergy between feature selection and data re-sampling for high-performance imbalance classification has rarely been investigated before. To address this issue, we carry out a comprehensive empirical study on the joint influence of feature selection and re-sampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines for imbalance classification by applying feature selection before or after data re-sampling. We conduct a large number of experiments, with a total of 9225 tests, on 52 publicly available datasets, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results show that there is no constant winner between the two pipelines; thus both of them should be considered to derive the best performing model for imbalance classification. We find that the performance of an imbalance classification model not only depends on the classifier adopted and the ratio between the number of majority and minority samples, but also depends on the ratio between the number of samples and features. Overall, this study should provide new reference value for researchers and practitioners in imbalance learning.

show abstract

A parameter-free label propagation algorithm for person identification in stereo videos

Zhang

Liu

et al. 2016

Neurocomputing

View full text Add to dashboard Cite

Motivated by relaxing expensive and laborious person identity annotation in stereo videos, a number of research efforts have recently been dedicated to label propagation. In this work, we propose two heuristic label propagation algorithms for annotating person identities in stereo videos under the observation that the actors in two consecutive facial images in a video are more likely to be identical.In the light of this, after adjacent video frames divided into several groups, we propose our first algorithm (i.e. ZBLC4) to automatically annotate the unlabeled images with the one having the maximum summed similarity between unlabeled and labeled images in each group in the parameter-free manner. Moreover, to cope with singleton groups, an additional classifier is introduced into ZBLC4 algorithm to mitigate the suffering of unreliable prediction dependent on neighbours. We conduct experiments on three publicly-benchmarking stereo videos, demonstrating that our algorithms are superior to the state-of-the-arts.

show abstract

A unified deep semi-supervised graph learning scheme based on nodes re-weighting and manifold regularization

Dornaika

Zhang

2023

Neural Networks

View full text Add to dashboard Cite

Correction to: An empirical study on the joint impact of feature selection and data resampling on imbalance classification

Zhang

Soda

et al. 2022

Appl Intell

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jingjun Bi

An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme

Multi-Imbalance: An open-source software for multi-class imbalance learning

Feature selection and resampling in class imbalance learning: Which comes first? An empirical study in the biological domain

An empirical study on the joint impact of feature selection and data resampling on imbalance classification

An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification

A parameter-free label propagation algorithm for person identification in stereo videos

A unified deep semi-supervised graph learning scheme based on nodes re-weighting and manifold regularization

Correction to: An empirical study on the joint impact of feature selection and data resampling on imbalance classification

Contact Info

Product

Resources

About