Visual-Inertial-Semantic Scene Representation for 3D Object Detection

Dong, Jingming; Fei, Xiaohan; Soatto, Stefano

doi:10.1109/cvpr.2017.380

Cited by 55 publications

(35 citation statements)

References 67 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…And in [10], the authors address the localization task from only object observation in a prior semantic map by computing a matrix permanent. The second is SLAM-aided object detection [11,12] and reconstruction [13,14]: [11] develops an 2D object recognition system which is robust to viewpoint changing with the assistance of camera localization, while [12] performs confidence-growing 3D objects detection using visual-inertial measurements. [13,14] reconstruct the dense surface of 3D object by fusing the point cloud from monocular and RGBD SLAM respectively.…”

Section: Related Workmentioning

confidence: 99%

Stereo Vision-Based Semantic 3D Object and Ego-Motion Tracking for Autonomous Driving

Qin

Shen

2018

Lecture Notes in Computer Science

154

109

View full text Add to dashboard Cite

We propose a stereo vision-based approach for tracking the camera ego-motion and 3D semantic objects in dynamic autonomous driving scenarios. Instead of directly regressing the 3D bounding box using end-to-end approaches, we propose to use the easy-to-labeled 2D detection and discrete viewpoint classification together with a light-weight semantic inference method to obtain rough 3D object measurements. Based on the object-aware-aided camera pose tracking which is robust in dynamic environments, in combination with our novel dynamic object bundle adjustment (BA) approach to fuse temporal sparse feature correspondences and the semantic 3D measurement model, we obtain 3D object pose, velocity and anchored dynamic point cloud estimation with instance accuracy and temporal consistency. The performance of our proposed method is demonstrated in diverse scenarios. Both the ego-motion estimation and object localization are compared with the state-of-of-theart solutions.

show abstract

Section: Related Workmentioning

confidence: 99%

Stereo Vision-Based Semantic 3D Object and Ego-Motion Tracking for Autonomous Driving

Qin

Shen

2018

Lecture Notes in Computer Science

154

109

View full text Add to dashboard Cite

show abstract

“…• Semantic localization and mapping: Although geometric features such as points, lines and planes [151,165] are primarily used in current VINS for localization, these handcrafted features may not be work best for navigation, and it is of importance to be able to learn best features for VINS by leveraging recent advances of deep learning [166]. Moreover, a few recent research efforts have attempted to endow VINS with semantic understanding of environments [167,168,169,170], which is only sparsely explored but holds great potentials.…”

Section: Resultsmentioning

confidence: 99%

Visual-Inertial Navigation: A Concise Review

Huang

2019

2019 International Conference on Robotics and Automation (ICRA)

278

119

View full text Add to dashboard Cite

As inertial and visual sensors are becoming ubiquitous, visual-inertial navigation systems (VINS) have prevailed in a wide range of applications from mobile augmented reality to aerial navigation to autonomous driving, in part because of the complementary sensing capabilities and the decreasing costs and size of the sensors. In this paper, we survey thoroughly the research efforts taken in this field and strive to provide a concise but complete review of the related work -which is unfortunately missing in the literature while being greatly demanded by researchers and engineers -in the hope to accelerate the VINS research and beyond in our society as a whole.where δq describes the small rotation that causes the true and estimated attitude to coincide. The advantage of this parametrization permits a minimal representation, 3 × 3 covariance matrix E I θ I θ T , for the attitude uncertainty.

show abstract

“…Recently, new techniques have emerged to estimate the 3D spatial layout of the objects as well as their occupancy [27,11,2]. These techniques rely on the quality of deep learning object detectors [27,11] or the use of additional range data [2]. Similarly volumetric approaches have been used to construct the layout of objects in rooms, or construct objects and regress their positioning [33].…”

Section: Related Workmentioning

confidence: 99%

Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning

James

Bue

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Recent approaches on visual scene understanding attempt to build a scene graph -a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motions. Indeed, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.

show abstract

Visual-Inertial-Semantic Scene Representation for 3D Object Detection

Cited by 55 publications

References 67 publications

Stereo Vision-Based Semantic 3D Object and Ego-Motion Tracking for Autonomous Driving

Stereo Vision-Based Semantic 3D Object and Ego-Motion Tracking for Autonomous Driving

Visual-Inertial Navigation: A Concise Review

Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning

Contact Info

Product

Resources

About