Abstract. We present a novel Dynamic Bayesian Network for pedestrian path prediction in the intelligent vehicle domain. The model incorporates the pedestrian situational awareness, situation criticality and spatial layout of the environment as latent states on top of a Switching Linear Dynamical System (SLDS) to anticipate changes in the pedestrian dynamics. Using computer vision, situational awareness is assessed by the pedestrian head orientation, situation criticality by the distance between vehicle and pedestrian at the expected point of closest approach, and spatial layout by the distance of the pedestrian to the curbside. Our particular scenario is that of a crossing pedestrian, who might stop or continue walking at the curb. In experiments using stereo vision data obtained from a vehicle, we demonstrate that the proposed approach results in more accurate path prediction than only SLDS, at the relevant short time horizon (1 s), and slightly outperforms a computationally more demanding state-of-the-art method.
Anticipating future situations from streaming sensor data is a key perception challenge for mobile robotics and automated vehicles. We address the problem of predicting the path of objects with multiple dynamic modes. The dynamics of such targets can be described by a Switching Linear Dynamical System (SLDS). However, predictions from this probabilistic model cannot anticipate when a change in dynamic mode will occur. We propose to extract various types of cues with computer vision to provide context on the target's behavior, and incorporate these in a Dynamic Bayesian Network (DBN). The DBN extends the SLDS by conditioning the mode transition probabilities on additional context states. We describe efficient online inference in this DBN for probabilistic path prediction, accounting for uncertainty in both measurements and target behavior. Our approach is illustrated on two scenarios in the Intelligent Vehicles domain concerning pedestrians and cyclists, so-called Vulnerable Road Users (VRUs). Here, context cues include the static environment of the VRU, its dynamic environment, and its observed actions. Experiments using stereo vision data from a moving vehicle demonstrate that the proposed approach results in more accurate path prediction than SLDS at the relevant short time horizon (1 s). It slightly outperforms a computationally more demanding state-of-the-art method.
Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238200 person instances manually labeled in over 47300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211200). We optimize four state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3) to serve as baselines for the new object detection benchmark. In experiments with previous datasets we analyze the generalization capabilities of these detectors when trained with the new dataset. We furthermore study the effect of the training set size, the dataset diversity (day-vs. night-time, geographical region), the dataset detail (i.e. availability of object orientation information) and the annotation quality on the detector performance. Finally, we analyze error sources and discuss the road ahead. Index Terms-Object detection, benchmarking ! • M. Braun, S. Krebs and F. Flohr are with the Environment Perception Group, Daimler AG • M. Braun, S. Krebs, and D. M. Gavrila are with the Intelligent Vehicles Group at TU Delft.
This paper presents an iterative, EM-like framework for accurate pedestrian segmentation, combining generative shape models and multiple data cues. In the E-step, shape priors are introduced in the unary terms of a Conditional Random Field (CRF) formulation, joining other data terms derived from color, texture and disparity cues. In the M-step, the resulting segmentation is used to adapt an Active Shape Model (ASM), after which the EM process alternates.Experiments on the public Penn-Fudan pedestrian dataset suggest that our method outperforms the state-of-the-art. We further provide results on a new Daimler pedestrian dataset, captured from on-board a vehicle, which includes disparity data. This dataset is made public to facilitate benchmarking.
We present a probabilistic framework for the joint estimation of pedestrian head and body orientation from a mobile stereo vision platform. For both head and body parts, we convert the responses of a set of orientation-specific detectors into a (continuous) probability density function. The parts are localized by means of a pictorial structure approach, which balances part-based detector responses with spatial constraints. Head and body orientations are estimated jointly to account for anatomical constraints. The joint single-frame orientation estimates are integrated over time by particle filtering. The experiments involved data from a vehicle-mounted stereo vision camera in a realistic traffic setting; 65 pedestrian tracks were supplied by a state-of-the-art pedestrian tracker. We show that the proposed joint probabilistic orientation estimation framework reduces the mean absolute head and body orientation error up to 15 • compared with simpler methods. This results in a mean absolute head/body orientation error of about 21 • /19 • , which remains fairly constant up to a distance of 25 m. Our system currently runs in near real time (8-9 Hz).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.