Object tracking in complex backgrounds with dramatic appearance variations is a challenging problem in computer vision. We tackle this problem by a novel approach that incorporates a deep learning architecture with an on-line AdaBoost framework. Inspired by its multi-level feature learning ability, a stacked denoising autoencoder (SDAE) is used to learn multi-level feature descriptors from a set of auxiliary images. Each layer of the SDAE, representing a different feature space, is subsequently transformed to a discriminative object/background deep neural network (DNN) classifier by adding a classification layer. By an on-line AdaBoost feature selection framework, the ensemble of the DNN classifiers is then updated on-line to robustly distinguish the target from the background. Experiments on an open tracking benchmark show promising results of the proposed tracker as compared with several state-of-the-art approaches.
A practical large scale product recognition system suffers from the phenomenon of long-tailed imbalanced training data under the E-commercial circumstance at Alibaba. Besides product images at Alibaba, plenty of image related side information (e.g. title, tags) reveal rich semantic information about images. Prior works mainly focus on addressing the long tail problem in visual perspective only, but lack of consideration of leveraging the side information. In this paper, we present a novel side information based large scale visual recognition co-training (SICoT) system to deal with the long tail problem by leveraging the image related side information. In the proposed co-training system, we firstly introduce a bilinear word attention module aiming to construct a semantic embedding over the noisy side information. A visual feature and semantic embedding co-training scheme is then designed to transfer knowledge from classes with abundant training data (head classes) to classes with few training data (tail classes) in an end-to-end fashion. Extensive experiments on four challenging large scale datasets, whose numbers of classes range from one thousand to one million, demonstrate the scalable effectiveness of the proposed SICoT system in alleviating the long tail problem. In the visual search platform Pailitao 1 at Alibaba, we settle a practical large scale product recognition application driven by the proposed SICoT system, and achieve a significant gain of unique visitor (UV) conversion rate.
Ball-tracking is a key technology in processing and analyzing a ball game. Because of the complexity of visual scenes, a large number of objects are usually selected as candidates for the ball, leading to incorrect identification, and conversely, the true position of the ball may sometimes be missed. In this paper, we propose a two layered data association method to improve the robustness of ball-tracking. At a local layer, we use a sliding window based Token Transfer method to generate a set of sub-trajectory candidates. At a global layer, a single ball trajectory is obtained by applying a dynamic programming based splice method to a graph consisting of the sub-trajectory candidates. We evaluated our approach on tennis matches from the Australian Open and the U.S. Open, and the results obtained show that our approach outperforms the state-of-art approach by around 30 %.
Traditional unsupervised broadcast news story segmentation approaches have to set the segmentation number manually, while this number is often unknown in real-world applications. In this paper, we solve this problem by modeling the generative process of stories as distance dependent Chinese restaurant process (dd-CRP) mixtures. We cut a news program into fixed-size text blocks and consider these blocks in the same story are generated from a storyspecific topic. Specifically, we add a dd-CRP prior which has an essential bias that the blocks' topic is more likely to be the same with the nearby blocks. Subsequently, story boundaries can be found by detecting the changes of topics. Experiments show that our approach outperforms both supervised and unsupervised approaches and the segmentation number can be automatically learned from data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.