The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.380
|View full text |Cite
|
Sign up to set email alerts
|

Visual-Inertial-Semantic Scene Representation for 3D Object Detection

Abstract: We describe a system to detect objects in threedimensional space using video and inertial sensors (accelerometer and gyrometer), ubiquitous in modern mobile platforms from phones to drones. Inertials afford the ability to impose class-specific scale priors for objects, and provide a global orientation reference. A minimal sufficient representation, the posterior of semantic (identity) and syntactic (pose) attributes of objects in space, can be decomposed into a geometric term, which can be maintained by a loca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
34
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 55 publications
(35 citation statements)
references
References 67 publications
1
34
0
Order By: Relevance
“…And in [10], the authors address the localization task from only object observation in a prior semantic map by computing a matrix permanent. The second is SLAM-aided object detection [11,12] and reconstruction [13,14]: [11] develops an 2D object recognition system which is robust to viewpoint changing with the assistance of camera localization, while [12] performs confidence-growing 3D objects detection using visual-inertial measurements. [13,14] reconstruct the dense surface of 3D object by fusing the point cloud from monocular and RGBD SLAM respectively.…”
Section: Related Workmentioning
confidence: 99%
“…And in [10], the authors address the localization task from only object observation in a prior semantic map by computing a matrix permanent. The second is SLAM-aided object detection [11,12] and reconstruction [13,14]: [11] develops an 2D object recognition system which is robust to viewpoint changing with the assistance of camera localization, while [12] performs confidence-growing 3D objects detection using visual-inertial measurements. [13,14] reconstruct the dense surface of 3D object by fusing the point cloud from monocular and RGBD SLAM respectively.…”
Section: Related Workmentioning
confidence: 99%
“…• Semantic localization and mapping: Although geometric features such as points, lines and planes [151,165] are primarily used in current VINS for localization, these handcrafted features may not be work best for navigation, and it is of importance to be able to learn best features for VINS by leveraging recent advances of deep learning [166]. Moreover, a few recent research efforts have attempted to endow VINS with semantic understanding of environments [167,168,169,170], which is only sparsely explored but holds great potentials.…”
Section: Resultsmentioning
confidence: 99%
“…Recently, new techniques have emerged to estimate the 3D spatial layout of the objects as well as their occupancy [27,11,2]. These techniques rely on the quality of deep learning object detectors [27,11] or the use of additional range data [2]. Similarly volumetric approaches have been used to construct the layout of objects in rooms, or construct objects and regress their positioning [33].…”
Section: Related Workmentioning
confidence: 99%