Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as 1 -2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
Large stores of digital video pose severe computational challenges to existing video analysis algorithms. In applying these algorithms, users must often trade-off processing speed for accuracy, as many sophisticated and effective algorithms require large computational resources that make it impractical to apply them throughout long videos. One can save considerable effort by applying these expensive algorithms sparingly, directing their application using the results of more limited processing. We show how to do this for retrospective video analysis by modeling a video using a chain graphical model and performing inference both to analyze the video and to direct processing. To accomplish this, we develop a new algorithm to direct processing. This algorithm approximates the optimal solution efficiently. We apply our algorithm to problems in background subtraction and face detection and show in experiments that this leads to significant improvements over baseline algorithms. Index TermsVideo processing, resource allocation, graphical models, optimization, background subtraction, face detection, dynamic programming I. INTRODUCTION New technology is giving rise to large stores of digital video. Their size has increased much faster than the computational resources needed to effectively process them. At the same time, as we develop increasingly sophisticated and effective vision algorithms, these also demand greater computational resources. Consequently, it is important to develop strategies for applying vision algorithms with greater efficiency to video data.The scope of the problems we face is evident. Surveillance systems can contain thousands of cameras and a large amount of video. Real time processing of huge data sets is extremely challenging; retrospective or forensic analysis creates even greater problems when one must rapidly examine hours or days of video from thousands of cameras. For example, British police were required to examine 80,000 CCTV tapes from a network of 25,000 cameras [1] to discover the image of a bomber after the terrorist attack in London in 2005 [2]. Automatic processing is needed to speed up this analysis, but one cannot hope to process all available video in such cases; it is essential to direct processing to portions of video most likely to be informative.In this paper, we develop a new method for controlling processing, so that available resources are directed at the most relevant portions of the video. In our proposed approach, we initially perform some inexpensive processing of a video by applying a cheap but less accurate algorithm combined with sparse application of a more expensive and accurate algorithm. We then use an inference algorithm to determine to which frames we should apply further expensive processing.Our first contribution is to combine information from cheap and expensive features, using a graphical model for video. This is a second-order Markov model with a node for each frame, and a state variable that indicates whether this frame is relevant to our current ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.