Nowadays people detection in video surveillance environments is a task that has been generating great interest. There are many approaches trying to solve the problem either in controlled scenarios or in very specific surveillance applications. Themain objective of this paper is to give a comprehensive and extensive evaluation of the state of the art of people detection regardless of the final surveillance application. For this reason, firstly, the different processing tasks involved in the automatic people detection in video sequences have been defined, then a proper classification of the state of the art of people detection has been made according to the two most critical tasks, object detection and person model, that are needed in every detection approach. Finally, experiments have been performed on an extensive dataset with different approaches that completely cover the proposed classification and support the conclusions drawn from the state of the art.
Introduction:Computer vision has been an evolving field for the last years with multiple lines of research and different application domains. Video surveillance has been one of the most developed domains for the last 10 years [1,2,3,4]. The need for providing security to people and their properties in the entire world explains the huge development and expansion of video surveillance systems nowadays. Video surveillance systems try to automatically extract information from the video sequence and to generate a scene description useful for human interactions with the system: alarms, logs, statistics, indexing and retrieval, etc.Within the computer vision field, particularly in the research area of digital image and video processing, there exists a rich variety of algorithms for segmentation, object detection, event recognition, etc, which are being used in surveillance systems. Automatic people detection in video sequences [5,6,7,8] is one of the most challenging problems in computer vision. The complexity of the people detection problem is mainly based on the difficulty of modeling people because of their huge variability in physical appearances, articulated body parts, poses, movements, points of view and interactions among different people and objects. This complexity is even higher in typical real world surveillance scenarios such as airports, malls, etc, which often include multiple people, multiple occlusions and background variability.There is a large number of people detection surveys in the literature, some of them partially cover only the state of the art or are clearly focused on some particular video surveillance application.[5] presents a survey of people detection and also the integration of the detectors into onboard full systems. It decomposes people detection approaches into three processing tasks: generation of initial object hypotheses or Regions of Interest (ROI) selection, verification (classification) and temporal integration (tracking).[6] also presents a survey of people detection but with a clear focus on driver assistance systems and def...