3D Semantic Scene Completion from a Single Depth Image Using Adversarial Training

Chen, Yueh-Tung; Garbade, Martin; Gall, Jüergen

doi:10.1109/icip.2019.8803174

Cited by 16 publications

(20 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Voxel grid encodes scene geometry as 3D grid, which cells describe semantic occupancy of the space. Opposed to point clouds, grids conveniently define neighborhood with adjacent cells, and thus enable easy application of 3D CNNs, which facilitates to extend deep learning architectures designed for 2D data into 3D [14,17,19,22,24,28,29,39,49,68,108,118,155,158]. However, the representation suffers from constraining limitations and efficiency drawbacks since it represents both occupied and free regions of the scene, leading to high memory and computation needs.…”

Section: Scene Representationsmentioning

confidence: 99%

“…For scene completion, the value of the gradient field is estimated at specific locations, typically at the voxel centers, for voxel grids [22,24], or at the point locations for point clouds [105]. Implicit surface may also be used as input [14,17,22,24,28,29,68,118,141,155,158] to reduce the sparsity of the input data, at the expense of greedy computation. For numerical reason, most works encode in fact a flipped version (cf.…”

Section: Scene Representationsmentioning

confidence: 99%

See 1 more Smart Citation

3D Semantic Scene Completion: a Survey

Roldão¹,

Charette²,

Verroust-Blondet³

2021

Preprint

View full text Add to dashboard Cite

Dataset Year Type Nature Input →Ground truth 3D Sensor # Classes Tasks* #Sequences #Frames SSC DE SPE SS OC SNE NYUv2 [115] a 2012 Real-world † Indoor RGB-D → Mesh/Voxel RGB-D 894 (11) 1449 795/654 SUN3D [146] 2013 Real-world Indoor RGB-D → Points RGB-D -254 -NYUCAD [34] b 2013 Synthetic Indoor RGB-D → Mesh/Voxel RGB-D 894 (11) 1449 795/654 SceneNet [55] 2015 Synthetic Indoor RGB-D → Mesh RGB-D ‡ 11 57 -SceneNN [57] 2016 Real-world Indoor RGB-D → Mesh RGB-D 22 100 -SUNCG [118] 2017 Synthetic Indoor Depth → Mesh/Voxel RGB-D ‡ 84 (11) 45622 139368/470 Matterport3D [10] 2017 Real-world Indoor RGB-D → Mesh 3D Scanner 40 (11) 707 72/18 ScanNet [21] 2017 Real-world Indoor RGB-D → Mesh RGB-D 20 (11) 1513 1201/312 2D-3D-S [3] 2017 Real-world Indoor RGB-D → Mesh 3D Scanner 13 270 -SUNCG-RGBD [76] c 2018 Synthetic Indoor RGB-D → Mesh/Voxel RGB-D ‡ 84 (11) 45622 13011/499 SemanticKITTI [4] 2019 Real-world Outdoor Points/RGB → Points/Voxel Lidar-64 28 (19) 22 23201/20351 SynthCity [44] 2019 Synthetic Outdoor Points → Points Lidar ‡ 9 9 -CompleteScanNet [145] d 2020 Real-world † Indoor RGB-D → Mesh/Voxel RGB-D 11 1513 45448/11238 SemanticPOSS [91] 2020 Real-world Outdoor Points/RGB → Points Lidar-40 14 2988 - ‡ Simulated sensor. † Synthetically augmented. a Mesh annotations from [47]. b Derivates from NYUv2 [115] by rendering depth images from mesh annotation. c Derivates from subset of SUNCG [118] where missing RGB images were rendered. d Derivates from ScanNet [21] by fitting CAD models to dense mesh.

show abstract

Section: Scene Representationsmentioning

confidence: 99%

Section: Scene Representationsmentioning

confidence: 99%

3D Semantic Scene Completion: a Survey

Roldão¹,

Charette²,

Verroust-Blondet³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Existing works all use geometrical inputs like depth [12,25,[39][40][41][42]45], occupancy grids [13,25,55,69] or point cloud [53,81]. Truncated Signed Distance Function (TSDF) were also proved informative [6,9,10,12,20,21,41,59,64,77,79]. Among others originalities, some SSC works use adversarial training to guide realism [10,64], exploit multi-task [6,38], or use lightweight networks [40,55].…”

Section: Related Workmentioning

confidence: 99%

“…Truncated Signed Distance Function (TSDF) were also proved informative [6,9,10,12,20,21,41,59,64,77,79]. Among others originalities, some SSC works use adversarial training to guide realism [10,64], exploit multi-task [6,38], or use lightweight networks [40,55]. Of interest for us, while others have used RGB as input [6,8,9,14,20,20,25,29,39,40,42,45,81] it is always along other geometrical input (e.g.…”

Section: Related Workmentioning

confidence: 99%

MonoScene: Monocular 3D Semantic Scene Completion

Cao,

de Charette

2021

Preprint

View full text Add to dashboard Cite

MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://github.com/cv-rits/MonoScene

show abstract

“…While this boosts performance, it also increases the network complexity and subsequently the inference time. Generative Adversarial Networks (GANs) have also been proposed to enforce realistic outputs [39,7] but are harder to train. To lower memory consumption with the preferred voxelized representations, Spatial Group Convolutions (SGC) [40] divide input into groups for efficient processing at the cost of small performance drops.…”

Section: Related Workmentioning

confidence: 99%

LMSCNet: Lightweight Multiscale 3D Semantic Completion

Roldão¹,

Charette²,

Verroust-Blondet³

2020

Preprint

View full text Add to dashboard Cite

car road parking sidewalk building fence vegetation trunk terrain pole traffic-sign LMSCNet (1:8) (1:4) (1:2) (viz. only) Figure 1: To prevent heavy computation overhead we use a mix of 2D/3D convolutions to infer multiscale 3D semantic scene completion from sparse voxelized input. Evaluation performed on the challenging SemanticKITTI [1] benchmark shows that our LMSCNet proposal reaches state-of-the-art performance at significantly faster computation speed.

show abstract

3D Semantic Scene Completion from a Single Depth Image Using Adversarial Training

Cited by 16 publications

References 18 publications

3D Semantic Scene Completion: a Survey

3D Semantic Scene Completion: a Survey

MonoScene: Monocular 3D Semantic Scene Completion

LMSCNet: Lightweight Multiscale 3D Semantic Completion

Contact Info

Product

Resources

About