2020 International Conference on 3D Vision (3DV) 2020
DOI: 10.1109/3dv50981.2020.00090
|View full text |Cite
|
Sign up to set email alerts
|

SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(4 citation statements)
references
References 43 publications
0
4
0
Order By: Relevance
“…In image-based techniques for semantic scene completion, many proposed methods in literature employ convolutional neural networks or transformers to process multi-modal data, like RGB and depth images [1,[8][9][10][11][12][13][14][15][16][17][18][19][20]. For instance, Song et al [1] proposed an end-to-end 3D convolutional network, introducing a dilation-based context module for efficient large-receptive-field context learning.…”
Section: Image-based Methodsmentioning
confidence: 99%
“…In image-based techniques for semantic scene completion, many proposed methods in literature employ convolutional neural networks or transformers to process multi-modal data, like RGB and depth images [1,[8][9][10][11][12][13][14][15][16][17][18][19][20]. For instance, Song et al [1] proposed an end-to-end 3D convolutional network, introducing a dilation-based context module for efficient large-receptive-field context learning.…”
Section: Image-based Methodsmentioning
confidence: 99%
“…However, they produce predictions as sparse as the LiDAR point cloud, offering an incomplete understanding of the whole scene. Semantic scene completion [50] aims for dense inference of 3D geometry and semantics of objects and surfaces within a given extent, typically leveraging rich geometry information at the input extracted from depth [16,35], occupancy grids [58,49], point clouds [48], or a mix of modalities, e.g., RGBD [11,17]. In this line, MonoScene [12] is the first camera-based method to produce dense semantic occupancy predictions from a single image by projecting image features into 3D voxels by optical ray intersection.…”
Section: Related Workmentioning
confidence: 99%
“…• As far as we know, our DOCTR is the first to introduce an object-centric Transformer-based network for the point scene understanding task that allows learning with multiple objects and multiple sub-tasks in a unified manner. Wu et al 2020;Yan et al 2021) often use voxelized input and then predict the semantic label of each voxel in both visible and occluded regions. They aim to jointly estimate the complete geometry and semantic labels from partial input.…”
Section: Introductionmentioning
confidence: 99%