In this work we present a new dataset for the tasks person detection, tracking, re-identification, and soft-biometric attribute detection in surveillance data. The dataset was recorded over three days and consists of more than 30 individuals moving through a network of seven cameras. Person tracks are labeled with consistent IDs as well as softbiometric attributes, such as a description of the clothing, gender, or height. Persons in the video data alter their appearance by changing clothes or wearing accessories. A second, clothing specific ID of each track allows for the evaluation of re-identification with or without the presence of clothing changes. In addition to video and camera calibration data, we provide evaluation protocols, tools and baseline results for each of the four tasks
Detection of moving objects is a fundamental task in video based surveillance and security applications. Many detection systems use background estimation methods to model the observed environment. In outdoor surveillance, moving backgrounds (waving trees, clutter) and illumination changes (weather changes, reflections, etc.) are the major challenges for background modelling and the development of a single model that fulfils all these requirements is usually not possible. In this paper we present a background estimation technique for motion detection in non-static backgrounds that overcomes this problem. We introduce an enhanced background estimation architecture with a long-term model and a short-term model. Our system showed that fusion of the detections of these two complementary approaches, improves the quality and reliability of the detection results
Face Hallucination (FH) differs from generic singleimage super-resolution (SR) algorithms in its specific domain of application. By exploiting the common structures of human faces, magnification of lower resolution images can be achieved. Despite the growing interest in recent years, considerably less attention is paid to a crucial step in FH-registration of facial images. In this work, registration techniques employed in the literature are first summarized and the importance of using well-aligned training and test images is demonstrated. A novel method to inversely map the high-resolution (HR) 3D training texture to the low-resolution (LR) 2D test image in arbitrary poses is then presented, which prevents information loss in LR images and is thus beneficial to SR. The effectiveness of our 3D approach is evaluated on the Multi-PIE [1] and the PUT [2] face databases. Superior qualitative and quantitative FH results to the state-ofthe-art methods in all tested poses prove the necessity of accurate registration in FH. The merit of 3D FH in generating superresolved frontal faces is also verified, revealing 30% improvement in face recognition over the 2D approach under 30°of yaw rotation on the Multi-PIE [1] dataset.
Super-resolution (SR) offers an effective approach to boost quality and details of low-resolution (LR) images to obtain high-resolution (HR) images. Despite the theoretical and technical advances in the past decades, it still lacks plausible methodology to evaluate and compare different SR algorithms. The main cause to this problem lies in the missing ground truth data for SR. Unlike in many other computer vision tasks, where existing image datasets can be utilized directly, or with a little extra annotation work, evaluating SR requires that the dataset contain both LR and the corresponding HR ground truth images of the same scene captured at the same time. This work presents a novel prototype camera system to address the aforementioned difficulties of acquiring ground truth SR data. Two identical camera sensors equipped with a wide-angle lens and a telephoto lens respectively, share the same optical axis by placing a beam splitter in the optical path. The back-end program can then trigger their shutters simultaneously and precisely register the region of interests (ROIs) of the LR and HR image pairs in an automated manner free of sub-pixel interpolation. Evaluation results demonstrate the special characteristics of the captured ground truth HR-LR face images compared to the simulated ones. The dataset is made freely available for noncommercial research purposes
This paper presents a fully automatic system that recovers 3D face models from sequences of facial images. Unlike most 3D Morphable Model (3DMM) fitting algorithms that simultaneously reconstruct the shape and texture from a single input image, our approach builds on a more efficient least squares method to directly estimate the 3D shape from sparse 2D landmarks, which are localized by face alignment algorithms. The inconsistency between self-occluded 2D and 3D feature positions caused by head pose is ad-dressed. A novel framework to enhance robustness across multiple frames selected based on their 2D landmarks combined with individual self-occlusion handling is proposed. Evaluation on ground truth 3D scans shows superior shape and pose estimation over previous work. The whole system is also evaluated on an 'in the wild' video dataset [12] and delivers personalized and realistic 3D face shape and texture models under less constrained conditions, which only takes seconds to process each video clip
While for static cameras several background subtraction approaches have been developed in the past, for non-static pan/tilt cameras efficient and robust motion detection is still a challenging task. Known approaches use image-to-image registration methods to generate a panorama background model of the scene, which spans a joint pixel coordinate system for later background estimation and subtraction. However, for a real-time panorama-based background subtraction a highly efficient image-to-panorama registration is needed. For this purpose, in this paper a key-frame representation of the panorama image is proposed and a strategy for fast global homography estimation in large panorama images is presented
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.