Sports analysis has recently attracted increasing research efforts in computer vision. Among them, basketball video analysis is very challenging due to severe occlusions and fast motions. As a typical tracking-by-detection method, k-shortest paths (KSP) tracking framework has been well used for multiple-person tracking. While effective and fast, the neglect of the appearance model would easily lead to identity switches, especially when two or more players are intertwined with each other. This paper addresses this problem by taking the appearance features into account based on the KSP framework. Furthermore, we also introduce a similarity measurement method that can fuse multiple appearance features together. In this paper, we select jersey color and jersey number as two example features. Experiments indicate that about 70% of jersey color and 50% of jersey number over a whole sequence would ensure our proposed method preserve the player identity better than the existing KSP tracking method.
The i-vector model is widely used by the state-of-the-art speaker recognition system. We proposed a new Mahalanobis metric scoring learned from weighted pairwise constraints (WPCML), which use the different weights for the empirical error of the similar and dissimilar pairs. In the new i-vector space described by the metric, the distance between the same speaker's i-vectors is small, while that of the different speakers' is large. In forming the training set, we use the traditional way in random fashion and develop a new nearest distance based way. The results on the NIST 2008 telephone data shown that our model can get better performance than the classical cosine similarity scoring. When using the nearest distance based way to form the training set, our model is better than the state-of-the-art PLDA. And the results on the NIST 2014 i-vector challenge show that our model is also better than the PLDA.
The Visual Place Recognition problem aims to use an image to recognize the location that has been visited before. In most of the scenes revisited, the appearance and view are drastically different. Most previous works focus on the 2-D image-based deep learning method. However, the convolutional features are not robust enough to the challenging scenes mentioned above. In this paper, in order to take advantage of the information that helps the Visual Place Recognition task in these challenging scenes, we propose a new graph construction approach to extract the useful information from an RGB image and a depth image and fuse them in graph data. Then, we deal with the Visual Place Recognition problem as a graph classification problem. We propose a new Global Pooling method—Global Structure Attention Pooling (GSAP), which improves the classification accuracy by improving the expression ability of the Global Pooling component. The experiments show that our GSAP method improves the accuracy of graph classification by approximately 2–5%, the graph construction method improves the accuracy of graph classification by approximately 4–6%, and that the whole Visual Place Recognition model is robust to appearance change and view change.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.