In this paper we describe a system that enables a mobile robot equipped with a color vision system to track humans in indoor environments. We developed a method for tracking humans when they are within the field of view of the camera, based on motion and color cues. However, the robot also has to keep track of humans which leave the field of view and re-enter later. We developed a dynamic Bayesian network for such a global tracking task. Experimental results on real data confirm the viability of the developed method.Index Terms-Human-robot interaction, people tracking, vision-based user interfaces.
This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: "scream", "passing train" or "articulation energy". At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene. Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.
Visual surveillance in wide areas (e.g. airports) relies on sparsely distributed cameras, that is, cameras that observe nonoverlapping scenes. In this setup, multiobject tracking requires reidentification of an object when it leaves one field of view, and later appears at some other. Although similar association problems are common for multiobject tracking scenarios, in the distributed case one has to cope with asynchronous observations and cannot assume smooth motion of the objects. In this paper, we propose a method for human indoor tracking. The method is based on a Dynamic Bayes Network (DBN) as a probabilistic model for the observations. The edges of the network define the correspondences between observations of the same object. Accordingly, we derive an approximate EM-like method for selecting the most likely structure of DBN and learning model parameters. The presented algorithm is tested on a collection of real-world observations gathered by a system of cameras in an office building.
Visual surveillance in wide areas (e.g. airports) relies on cameras that observe non-overlapping scenes. Multi-person tracking requires re-identification of a person when he/she leaves one field of view, and later appears at another. For this, we use appearance cues. Under the assumption that all observations of a single person are Gaussian distributed, the observation model in our approach consists of a Mixture of Gaussians. In this paper we propose a distributed approach for learning this MoG, where every camera learns from both its own observations and communication with other cameras. We present the Multi-Observations Newscast EM algorithm for this, which is an adjusted version of the recently developed Newscast EM. The presented algorithm is tested on artificial generated data and on a collection of real-world observations gathered by a system of cameras in an office building.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.