Abstract-People detection and tracking is a key component for robots and autonomous vehicles in human environments. While prior work mainly employed image or 2D range data for this task, in this paper, we address the problem using 3D range data. In our approach, a top-down classifier selects hypotheses from a bottom-up detector, both based on sets of boosted features. The bottom-up detector learns a layered person model from a bank of specialized classifiers for different height levels of people that collectively vote into a continuous space. Modes in this space represent detection candidates that each postulate a segmentation hypothesis of the data. In the top-down step, the candidates are classified using features that are computed in voxels of a boosted volume tessellation. We learn the optimal volume tessellation as it enables the method to stably deal with sparsely sampled and articulated objects. We then combine the detector with tracking in 3D for which we take a multi-target multi-hypothesis tracking approach. The method neither needs a ground plane assumption nor relies on background learning.The results from experiments in populated urban environments demonstrate 3D tracking and highly robust people detection up to 20 m with equal error rates of at least 93%.
I. IPeople detection and tracking is a key skill for mobile robots and intelligent cars in populated environments. While most of the related work in this area used vision for this task, range sensing is a particularly interesting sensor modality due to its accuracy, large field of view and robustness with respect to illumination changes and vibrations, the latter points being of particular relevance for mobile observers.In this paper we address two problems, detecting people in 3D range data and tracking people in 3D space. We extend our previous work on 3D people detection [1] by the tracking stage and an additional top-down procedure in the detection pipeline. This procedure aims at reducing false positives that typically occur with sparsely sampled individuals at large distances from the sensor. We further combine detection with tracking and present results from a tracker this is able to estimate the motion state of multiple people in 3D. To this end, we employ a multi-hypothesis tracking approach (MHT) by Reid [2] and Cox et al. [3]. In the experiments we compare our approach with related techniques for detection in 3D range data, in particular spin images [4] and templatebased classification.While there is little related work for people detection and tracking in 3D, many researchers addressed this task using 2D range data. In early works [5], [6], [7], people are detected using ad-hoc classifiers, looking for moving local minima in the scan. The first principled learning approach has been taken by Arras et al. [8] where a classifier for 2D point clouds has been learned by boosting a set of geometric and statistical features. As there is a natural performance limit when using only a single layer of 2D range data, several authors ha...