Semantic Scene Completion from a Single 360-Degree Image and Depth Map

Dourado, Aloisio; Kim, Han Sung; Campos, Teófilo de; Hilton, Adrian

doi:10.5220/0008877700360046

Cited by 9 publications

(16 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that datasets providing 3D meshes or point clouds easily be voxelized as detailed in [118]. Additionally, Stanford 2D-3D-S [3] provides 360 • RGB-D images, of interest for completing entire rooms [29]. Due to real datasets small sizes, low scene variability, and annotation ambiguities, synthetic SUNCG [118] (aka SUNCG-D) was proposed, being a large scale dataset with pairs of depth images and complete synthetic scene meshes.…”

Section: Datasetsmentioning

confidence: 99%

“…Voxel grid encodes scene geometry as 3D grid, which cells describe semantic occupancy of the space. Opposed to point clouds, grids conveniently define neighborhood with adjacent cells, and thus enable easy application of 3D CNNs, which facilitates to extend deep learning architectures designed for 2D data into 3D [14,17,19,22,24,28,29,39,49,68,108,118,155,158]. However, the representation suffers from constraining limitations and efficiency drawbacks since it represents both occupied and free regions of the scene, leading to high memory and computation needs.…”

Section: Scene Representationsmentioning

confidence: 99%

“…For scene completion, the value of the gradient field is estimated at specific locations, typically at the voxel centers, for voxel grids [22,24], or at the point locations for point clouds [105]. Implicit surface may also be used as input [14,17,22,24,28,29,68,118,141,155,158] to reduce the sparsity of the input data, at the expense of greedy computation. For numerical reason, most works encode in fact a flipped version (cf.…”

Section: Scene Representationsmentioning

confidence: 99%

See 2 more Smart Citations

3D Semantic Scene Completion: a Survey

Roldão¹,

Charette²,

Verroust-Blondet³

2021

Preprint

View full text Add to dashboard Cite

Dataset Year Type Nature Input →Ground truth 3D Sensor # Classes Tasks* #Sequences #Frames SSC DE SPE SS OC SNE NYUv2 [115] a 2012 Real-world † Indoor RGB-D → Mesh/Voxel RGB-D 894 (11) 1449 795/654 SUN3D [146] 2013 Real-world Indoor RGB-D → Points RGB-D -254 -NYUCAD [34] b 2013 Synthetic Indoor RGB-D → Mesh/Voxel RGB-D 894 (11) 1449 795/654 SceneNet [55] 2015 Synthetic Indoor RGB-D → Mesh RGB-D ‡ 11 57 -SceneNN [57] 2016 Real-world Indoor RGB-D → Mesh RGB-D 22 100 -SUNCG [118] 2017 Synthetic Indoor Depth → Mesh/Voxel RGB-D ‡ 84 (11) 45622 139368/470 Matterport3D [10] 2017 Real-world Indoor RGB-D → Mesh 3D Scanner 40 (11) 707 72/18 ScanNet [21] 2017 Real-world Indoor RGB-D → Mesh RGB-D 20 (11) 1513 1201/312 2D-3D-S [3] 2017 Real-world Indoor RGB-D → Mesh 3D Scanner 13 270 -SUNCG-RGBD [76] c 2018 Synthetic Indoor RGB-D → Mesh/Voxel RGB-D ‡ 84 (11) 45622 13011/499 SemanticKITTI [4] 2019 Real-world Outdoor Points/RGB → Points/Voxel Lidar-64 28 (19) 22 23201/20351 SynthCity [44] 2019 Synthetic Outdoor Points → Points Lidar ‡ 9 9 -CompleteScanNet [145] d 2020 Real-world † Indoor RGB-D → Mesh/Voxel RGB-D 11 1513 45448/11238 SemanticPOSS [91] 2020 Real-world Outdoor Points/RGB → Points Lidar-40 14 2988 - ‡ Simulated sensor. † Synthetically augmented. a Mesh annotations from [47]. b Derivates from NYUv2 [115] by rendering depth images from mesh annotation. c Derivates from subset of SUNCG [118] where missing RGB images were rendered. d Derivates from ScanNet [21] by fitting CAD models to dense mesh.

show abstract

Section: Datasetsmentioning

confidence: 99%

Section: Scene Representationsmentioning

confidence: 99%

Section: Scene Representationsmentioning

confidence: 99%

See 1 more Smart Citation

3D Semantic Scene Completion: a Survey

Roldão¹,

Charette²,

Verroust-Blondet³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Existing works all use geometrical inputs like depth [12,25,[39][40][41][42]45], occupancy grids [13,25,55,69] or point cloud [53,81]. Truncated Signed Distance Function (TSDF) were also proved informative [6,9,10,12,20,21,41,59,64,77,79]. Among others originalities, some SSC works use adversarial training to guide realism [10,64], exploit multi-task [6,38], or use lightweight networks [40,55].…”

Section: Related Workmentioning

confidence: 99%

MonoScene: Monocular 3D Semantic Scene Completion

Cao,

de Charette

2021

Preprint

View full text Add to dashboard Cite

MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://github.com/cv-rits/MonoScene

show abstract

“…Semantic segmentation tasks aim to assign a semantic class label to every pixel in the input image. Examples of applications in scene understanding include PixelNet [11], which performs semantic segmentation and edge detection; EdgeNet [12], which combines depth information with semantic scene completion, using RGB-D input data. For synthetic data generation, UnrealCV provides a pipeline that generates images from VEs providing semantic segmentations [13], allowing for easy generation of training data.…”

Section: Related Workmentioning

confidence: 99%