ÐThis paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard independent probabilistic event detectors to propose candidate detections of low-level features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low-level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. To achieve such a system we: 1) provide techniques for generating a discrete symbol stream from continuous low-level detectors; 2) extend stochastic context-free parsing to handle uncertainty in the input symbol stream; 3) augment a run-time parsing algorithm to enforce intersymbol constraints such as requiring temporal consistency between primitives; and 4) extend the consistency filtering to maintain consistent multiobject interactions. We develop a real-time system and demonstrate the approach in several experiments on gesture recognition and in video surveillance. In the surveillance application, we show how the system correctly interprets activities of multiple, interacting objects.
The KidsRoom is a perceptually-based, interactive, narrative playspace for children. Images, music, narration, light, and sound effects are used to transform a normal child's bedroom into a fantasy land where children are guided through a reactive adventure story. The fully automated system was designed with the following goals: (1) to keep the focus of user action and interaction in the physical and not virtual space; (2) to permit multiple, collaborating people to simultaneously engage in an interactive experience combining both real and virtual objects; (3) to use computer-vision algorithms to identify activity in the space without requiring the participants to wear any special clothing or devices; (4) to use narrative to constrain the perceptual recognition, and to use perceptual recognition to allow participants to drive the narrative; and (5) to create a truly immersive and interactive room environment. We believe the KidsRoom is the first multi-person, fully-automated, interactive, narrative environment ever constructed using non-encumbering sensors. This paper describes the KidsRoom, the technology that makes it work, and the issues that were raised during the system's development. 1 1 A demonstration of the project, which complements the material presented here and includes videos, images, and sounds from each part of the story is available at .
This paper describes a new method of fast background subtraction based upon disparity v eri cation that is invariant to run-time changes in illumination. Using two or more cameras, the method requires the o -line construction of disparity elds mapping the primary or key background image to each of the additional reference background images. At runtime, segmentation is performed by c hecking color intensity v alues at corresponding pixels. If more than two cameras are available, more robust segmentation can be achieved and in particular, the occlusion shadows can be generally eliminated as well. Because the method only assumes xed background geometry, the technique allows for illumination variation at runtime. And, because no disparity search is performed at run time, the algorithm is easily implemented in real-time on conventional hardware.
The ability to learn is a potentially compelling and important quality for interactive synthetic characters. To that end, we describe a practical approach to real-time learning for synthetic characters. Our implementation is grounded in the techniques of reinforcement learning and informed by insights from animal training. It simpliÞes the learning task for characters by (a) enabling them to take advantage of predictable regularities in their world, (b) allowing them to make maximal use of any supervisory signals, and (c) making them easy to train by humans.We built an autonomous animated dog that can be trained with a technique used to train real dogs called "clicker training". Capabilities demonstrated include being trained to recognize and use acoustic patterns as cues for actions, as well as to synthesize new actions from novel paths through its motion space.A key contribution of this paper is to demonstrate that by addressing the three problems of state, action, and state-action space discovery at the same time, the solution for each becomes easier. Finally, we articulate heuristics and design principles that make learning practical for synthetic characters.
As an alternative to the tracking-based approaches that heavily depend on accurate detection of moving objects, which often fail for crowded scenarios, we present a pixelwise method that employs dual foregrounds to extract temporally static image regions. Depending on the application, these regions indicate objects that do not constitute the original background but were brought into the scene at a subsequent time, such as abandoned and removed items, illegally parked vehicles. We construct separate long-and short-term backgrounds that are implemented as pixelwise multivariate Gaussian models. Background parameters are adapted online using a Bayesian update mechanism imposed at different learning rates. By comparing each frame with these models, we estimate two foregrounds. We infer an evidence score at each pixel by applying a set of hypotheses on the foreground responses, and then aggregate the evidence in time to provide temporal consistency. Unlike optical flow-based approaches that smear boundaries, our method can accurately segment out objects even if they are fully occluded. It does not require on-site training to compensate for particular imaging conditions. While having a low-computational load, it readily lends itself to parallelization if further speed improvement is necessary.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.