This paper presents a technique for estimating the threedimensional velocity vector field that describes the motion of each visible scene point (scene flow). The technique presented uses two consecutive image pairs from a stereo sequence. The main contribution is to decouple the position and velocity estimation steps, and to estimate dense velocities using a variational approach. We enforce the scene flow to yield consistent displacement vectors in the left and right images. The decoupling strategy has two main advantages: Firstly, we are independent in choosing a disparity estimation technique, which can yield either sparse or dense correspondences, and secondly, we can achieve frame rates of 5 fps on standard consumer hardware. The approach provides dense velocity estimates with accurate results at distances up to 50 meters.
Building upon recent developments in optical flow and stereo matching estimation, we propose a variational framework for the estimation of stereoscopic scene flow, i.e., the motion of points in the three-dimensional world from stereo image sequences. The proposed algorithm takes into account image pairs from two consecutive times and computes both depth and a 3D motion vector associated with each point in the image. In contrast to previous works, we partially decouple the depth estimation from the motion estimation, which has many practical advantages. The variational formulation is quite flexible and can handle both sparse or dense disparity maps. The proposed method is very efficient; with the depth map being computed on an FPGA, and the scene flow computed on the GPU, the proposed algorithm runs at frame rates of 20 frames per second on QVGA images (320 × 240 pixels). Furthermore, we present solutions to two important problems in scene flow estimation: violations of intensity consistency between input images, and the uncertainty measures for the scene flow result.A. Wedel ( ) · C. Rabe · U. Franke GR/PAA, Daimler Research,
Performance evaluation of stereo or motion analysis techniques is commonly done either on synthetic data where the ground truth can be calculated from ray-tracing principals, or on engineered data where ground truth is easy to estimate. Furthermore, these scenes are usually only shown in a very short sequence of images. This paper shows why synthetic scenes may not be the only testing criteria by giving evidence of conflicting results of disparity and optical flow estimation for real-world and synthetic testing. The data dealt with in this paper are images taken from a moving vehicle. Each real-world sequence contains 250 image pairs or more. Synthetic driver assistance scenes (with ground truth) are 100 or more image pairs. Particular emphasis is paid to the estimation and evaluation of scene flow on the synthetic stereo sequences. All image data used in this paper is made publicly available at http: // www. mi. auckland. ac. nz/ EISATS .
This paper discusses options for testing correspondence algorithms in stereo or motion analysis that are designed or considered for vision-based driver assistance. It introduces a globally available database, with a main focus on testing on video sequences of real-world data. We suggest the classification of recorded video data into situations defined by a cooccurrence of some events in recorded traffic scenes. About 100-400 stereo frames (or 4-16 s of recording) are considered a basic sequence, which will be identified with one particular situation. Future testing is expected to be on data that report on hours of driving, and multiple hours of long video data may be segmented into basic sequences and classified into situations. This paper prepares for this expected development. This paper uses three different evaluation approaches (prediction error, synthesized sequences, and labeled sequences) for demonstrating ideas, difficulties, and possible ways in this future field of extensive performance tests in vision-based driver assistance, particularly for cases where the ground truth is not available. This paper shows that the complexity of real-world data does not support the identification of general rankings of correspondence techniques on sets of basic sequences that show different situations. It is suggested that correspondence techniques should adaptively be chosen in real time using some type of statistical situation classifiers.
This paper discusses the detection of moving objects (being a crucial part of driver assistance systems) using monocular or stereoscopic computer vision. In both cases, object detection is based on motion analysis of individually tracked image points (optical flow), providing a motion metric which corresponds to the likelihood that the tracked point is moving. Based on this metric, points are segmented into objects by employing a globally optimal graph-cut algorithm. Both approaches are comparatively evaluated using real-world vehicle image sequences.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.