In predictive tasks, real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (the head or the most frequent) classes have sufficient samples, the minority (the tail or the less frequent or rare) classes can be under-represented by a rather limited number of samples. Data pre-processing has been shown to be very effective in dealing with such problems. On one hand, data re-sampling is a common approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a conventional technique for reducing noise and inconsistencies in a dataset. However, the possible synergy between feature selection and data re-sampling for high-performance imbalance classification has rarely been investigated before. To address this issue, we carry out a comprehensive empirical study on the joint influence of feature selection and re-sampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines for imbalance classification by applying feature selection before or after data re-sampling. We conduct a large number of experiments, with a total of 9225 tests, on 52 publicly available datasets, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results show that there is no constant winner between the two pipelines; thus both of them should be considered to derive the best performing model for imbalance classification. We find that the performance of an imbalance classification model not only depends on the classifier adopted and the ratio between the number of majority and minority samples, but also depends on the ratio between the number of samples and features. Overall, this study should provide new reference value for researchers and practitioners in imbalance learning.
Motivated by relaxing expensive and laborious person identity annotation in stereo videos, a number of research efforts have recently been dedicated to label propagation. In this work, we propose two heuristic label propagation algorithms for annotating person identities in stereo videos under the observation that the actors in two consecutive facial images in a video are more likely to be identical.In the light of this, after adjacent video frames divided into several groups, we propose our first algorithm (i.e. ZBLC4) to automatically annotate the unlabeled images with the one having the maximum summed similarity between unlabeled and labeled images in each group in the parameter-free manner. Moreover, to cope with singleton groups, an additional classifier is introduced into ZBLC4 algorithm to mitigate the suffering of unreliable prediction dependent on neighbours. We conduct experiments on three publicly-benchmarking stereo videos, demonstrating that our algorithms are superior to the state-of-the-arts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.