Abstract-The exploitation of video data requires methods able to extract high-level information from the images. Video summarization, video retrieval, or video surveillance are examples of applications. In this paper, we tackle the challenging problem of recognizing dynamic video contents from low-level motion features. We adopt a statistical approach involving modeling, (supervised) learning, and classification issues. Because of the diversity of video content (even for a given class of events), we have to design appropriate models of visual motion and learn them from videos. We have defined original parsimonious global probabilistic motion models, both for the dominant image motion (assumed to be due to the camera motion) and the residual image motion (related to scene motion). Motion measurements include affine motion models to capture the camera motion and low-level local motion features to account for scene motion. Motion learning and recognition are solved using maximum likelihood criteria. To validate the interest of the proposed motion modeling and recognition framework, we report dynamic content recognition results on sports videos.
We consider the problem of motion detection by background subtraction. An accurate estimation of the background is only possible if we locate the moving objects; meanwhile, a correct motion detection is achieved if we have a good available background model. This work proposes a new direction in the way such problems are considered. The main idea is to formulate this class of problem as a joint decision-estimation unique step. The goal is to exploit the way two processes interact, even if they are of a dissimilar nature (symbolic-continuous), by means of a recently introduced framework called mixed-state Markov random fields. In this paper, we will describe the theory behind such a novel statistical framework, that subsequently will allows us to formulate the specific joint problem of motion detection and background reconstruction. Experiments on real sequences and comparisons with existing methods will give a significant support to our approach. Further implications for video sequence inpainting will be also discussed.
Abstract. The exploitation of video data requires to extract information at a rather semantic level, and then, methods able to infer "concepts" from low-level video features. We adopt a statistical approach and we focus on motion information. Because of the diversity of dynamic video content (even for a given type of events), we have to design appropriate motion models and learn them from videos. We have defined original and parsimonious probabilistic motion models, both for the dominant image motion (camera motion) and the residual image motion (scene motion). These models are learnt off-line. Motion measurements include affine motion models to capture the camera motion, and local motion features for scene motion. The two-step event detection scheme consists in pre-selecting the video segments of potential interest, and then in recognizing the specified events among the pre-selected segments, the recognition being stated as a classification problem. We report accurate results on several sports videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.