Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology “out of the lab” to real-world, diverse data. In this contribution, we address the problem of finding “disturbing” scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.
Abstract. In recent years the field of augmented reality (AR) has seen great advances in interaction, tracking and rendering. New input devices and mobile hardware have enabled entirely new interaction concepts for AR content. The high complexity of AR applications results in lacking usability evaluation practices on part of the developer. In this paper, we present a thorough classification of factors influencing user experience, split into the broad categories of rendering, tracking and interaction. Based on these factors, we propose an architecture for evaluating AR experiences prior to deployment in an adapted virtual reality (VR) environment. Thus we enable rapid prototyping and evaluation of AR applications especially suited for applications in challenging industrial AR projects.
The observation likelihood approximation is a central problem in stochastic human pose tracking. In this paper, we present a new approach to quantify the correspondence between hypothetical and observed human poses in depth images. Our approach is based on segmented point clouds, enabling accurate approximations even under selfocclusion and in the absence of color or texture cues. The segmentation step extracts small regions of high saliency such as hands or arms and ensures that the information contained in these regions is not marginalized by larger, less salient regions such as the chest. The proposed approximation function is evaluated on both synthetic and real camera data. In addition, we compare our approximation function against the corresponding function used by a state-of-theart pose tracker.
While monocular gesture recognition slowly reaches maturity, the inclusion of 3D gestures remains a challenge. In order to enable robust and versatile depth-enabled gestures, a depth-image based tracking approach is developed. Using a model-based annealing particle filter approach, the pose of a single subject is retrieved and tracked over longer image and motion sequences. Other than many previous depth-image based systems, full body tracking is performed. The system is independent from specific camera types and is independent from color or texture cues. Pose space exploration in complex kinematic chains is enhanced by considering extending inverse kinematics. Exploiting the highly parallel nature of the 3D point based approach, the algorithm is partially implemented on a GPU, leading to near real time performance.
Current experiments with HCIs have shown a high demand for more natural interaction paradigms. Gestures are thereby considered the most important cue besides speech. In order to recognize gestures it is necessary to extract meaningful motion features from the body. Up to now mostly marker based tracking systems are used in virtual reality environments, since these were traditionally more reliable than purely image based detection methods. However, markers tend to be distracting and cumbersome. Following recent advances in processing power, it becomes possible to use a camera system in order to obtain a depth image of the test subject, match it to a pre-defined body model, and thus track the body parts over time. We will present a fullbody system based on APF which enables full body tracking utilizing point clouds recorded with a 3D sensor. Further refinement is provided by a specially adapted inverse kinematics system. A GPU based implementation speeds up processing significantly and allows near real time performance.
In this paper an interaction framework for AR enhanced video conferencing is presented. The goal is to provide a cheap and portable system based on a combination of commodity Kinect cameras and regular computer screens. These conditions necessitate the use of contact free interaction methods. The interaction framework presented in this paper is specifically suited for remotely presenting, sharing and annotating visual data such as images, presentation slides and 3D objects. In the proposed system all data is represented by freely manipulable 3D objects which are augmented into the camera views. These representations are integrated into a differentiated ownership scheme, allowing for operations such as spatially managed data sharing. The suitability of different interaction paradigms with regards to this usage scenario is examined. Furthermore, occlusion and collision management between virtual objects and real obstacles is enabled by integrating basic models of the environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.