We propose Shift R-CNN, a hybrid model for monocular 3D object detection, which combines deep learning with the power of geometry. We adapt a Faster R-CNN network for regressing initial 2D and 3D object properties and combine it with a least squares solution for the inverse 2D to 3D geometric mapping problem, using the camera projection matrix. The closed-form solution of the mathematical system, along with the initial output of the adapted Faster R-CNN are then passed through a final ShiftNet network that refines the result using our newly proposed Volume Displacement Loss. Our novel, geometrically constrained deep learning approach to monocular 3D object detection obtains top results on KITTI 3D Object Detection Benchmark [5], being the best among all monocular methods that do not use any pre-trained network for depth estimation.
Modern driver assistance systems rely on a wide range of sensors (RADAR, LIDAR, ultrasound and cameras) for scene understanding and prediction. These sensors are typically used for detecting traffic participants and scene elements required for navigation. In this paper we argue that relying on camera based systems, specifically Around View Monitoring (AVM) system has great potential to achieve these goals in both parking and driving modes with decreased costs. The contributions of this paper are as follows: we present a new end-to-end solution for delimiting the safe drivable area for each frame by means of identifying the closest obstacle in each direction from the driving vehicle, we use this approach to calculate the distance to the nearest obstacles and we incorporate it into a unified endto-end architecture capable of joint object detection, curb detection and safe drivable area detection. Furthermore, we describe the family of networks for both a high accuracy solution and a low complexity solution. We also introduce further augmentation of the base architecture with 3D object detection.
When a picture contains non-translational motions in it, a picture-level parametric motion model can be more efficient than the block-based translational motion model because the former has small number of parameters that can replace many motion vectors of individual blocks. In addition, the former can represent the deformation of the image better than the latter. Based on this idea, we detected multiple homography transformations between a reference and a current picture. Then, we generated warped reference pictures 1 corresponding to the homographies and inserted the one of the warped reference pictures into reference picture lists.We measured the performance of the proposed algorithm using the test condition that is proposed by MPEG and ITU recently. Experimental result showed 3.1% overall bitrate saving under low delay & high efficiency condition and 3.5% overall bitrate saving under random access & high efficiency condition compared with TMuC 0.7.3.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.