Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image

Nie, Yinyu; Han, Xiaoguang; Guo, Shihui; Zheng, Yujian; Chang, Jian; Zhang, Jian Jun

doi:10.1109/cvpr42600.2020.00013

Cited by 163 publications

(215 citation statements)

References 40 publications

Supporting

Mentioning

215

Contrasting

Order By: Relevance

“…Rogers and Christensen (2012) and Lin et al (2013) leveraged objects to perform a joint object-and-place classification. Nie et al (2020), Huang et al (2018a), and Zhao and Zhu (2013b) jointly solved the problem of scene understanding and reconstruction. Pangercic et al (2012) reasoned on the objects’ functionality.…”

Section: Related Workmentioning

confidence: 99%

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

Rosinol

Violette

Abate

et al. 2021

The International Journal of Robotics Research

View full text Add to dashboard Cite

Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots’ internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, and voxels), or as a collection of objects. This article attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D dynamic scene graph (DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatiotemporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual–inertial data. Kimera includes accurate algorithms for visual–inertial simultaneous localization and mapping (SLAM), metric–semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves competitive performance in visual–inertial SLAM, estimates an accurate 3D metric–semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution is to showcase how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera have been released open source.

show abstract

Section: Related Workmentioning

confidence: 99%

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

Rosinol

Violette

Abate

et al. 2021

The International Journal of Robotics Research

View full text Add to dashboard Cite

show abstract

“…Discriminative methods can exploit large training datasets to learn to classify scene components from input data such as RGB and RGB-D images [4,18,35,51,56]. By introducing clever Deep Learning architectures applied to point clouds or voxel-based representations, these methods can achieve very good results.…”

Section: Complete Scene Reconstructionmentioning

confidence: 99%

“…3D scene understanding is a fundamental problem in Computer Vision [41,53]. In the case of indoor scenes, one usually aims at recognizing the objects and their properties such as their 3D pose and geometry [2,3,15], or the room layouts [57,31,62,59,30,36,50,60,62,54,55], or both [4,18,35,45,51,56]. With the development of deep learning approaches, the field has made a remarkable progress.…”

Section: Introductionmentioning

confidence: 99%

Monte Carlo Scene Search for 3D Scene Understanding

Hampali

Stekovic

Sarkar

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

“…Since 2015, many deep-learning-based 3D reconstruction methods are presented, among which the point-based technique is simple but efficient in terms of memory requirements [36]. Similar to volumetric [37,38] and surface-based representations [39,40], point-based techniques follow the encoder-decoder model. In general, grid representations use up-convolutional networks to decode the latent variable [41,42].…”

Section: Deep-learning-based Reconstructionmentioning

confidence: 99%

An Improved Algorithm Robust to Illumination Variations for Reconstructing Point Cloud Models from Images

et al. 2021

View full text Add to dashboard Cite

Reconstructing 3D point cloud models from image sequences tends to be impacted by illumination variations and textureless cases in images, resulting in missing parts or uneven distribution of retrieved points. To improve the reconstructing completeness, this work proposes an enhanced similarity metric which is robust to illumination variations among images during the dense diffusions to push the seed-and-expand reconstructing scheme to a further extent. This metric integrates the zero-mean normalized cross-correlation coefficient of illumination and that of texture information which respectively weakens the influence of illumination variations and textureless cases. Incorporated with disparity gradient and confidence constraints, the candidate image features are diffused to their neighborhoods for dense 3D points recovering. We illustrate the two-phase results of multiple datasets and evaluate the robustness of proposed algorithm to illumination variations. Experiments show that ours recovers 10.0% more points, on average, than comparing methods in illumination varying scenarios and achieves better completeness with comparative accuracy.

show abstract

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image

Cited by 163 publications

References 40 publications

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

Monte Carlo Scene Search for 3D Scene Understanding

An Improved Algorithm Robust to Illumination Variations for Reconstructing Point Cloud Models from Images

Contact Info

Product

Resources

About