2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00187
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction

Abstract: Indoor scenes exhibit rich hierarchical structure in 3D object layouts. Many tasks in 3D scene understanding can benefit from reasoning jointly about the hierarchical context of a scene, and the identities of objects. We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up encoding for context aggregation and top-down decoding for propagation. We train our VDRAE on large-scale 3D scene dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 27 publications
(20 citation statements)
references
References 60 publications
0
20
0
Order By: Relevance
“…The work in [38] achieves reasonable results on instance segmentation of 3D point clouds via analyzing point patch context. In [39], a recursive auto-encoder based approach is proposed to predict 3D object detection via exploring hierarchical context priors in 3D object layout. Inspired by the self-attention idea in natural language processing [40], recent works connect the self-attention mechanism with contextual information mining to improve scene understanding tasks such as image recognition [41], semantic segmentation [11] and point cloud recognition [42].…”
Section: Contextual Informationmentioning
confidence: 99%
“…The work in [38] achieves reasonable results on instance segmentation of 3D point clouds via analyzing point patch context. In [39], a recursive auto-encoder based approach is proposed to predict 3D object detection via exploring hierarchical context priors in 3D object layout. Inspired by the self-attention idea in natural language processing [40], recent works connect the self-attention mechanism with contextual information mining to improve scene understanding tasks such as image recognition [41], semantic segmentation [11] and point cloud recognition [42].…”
Section: Contextual Informationmentioning
confidence: 99%
“…Zhao et al (2011) use a stochastic grammar with three production rules: AND, OR and SET to model the scene layout, detected objects, planes and the background. Scene synthesis methods that aim to generate realistic scene models and layouts incorporate knowledge about an objects' context and the scene composition either directly or indirectly (Jiang et al 2018;Shi et al 2019). Jiang et al (2018) describe a configurable 3D scene synthesis pipeline based on stochastic grammars, so-called spatial and-or graphs.…”
Section: D Object Context and Scene Layoutmentioning
confidence: 99%
“…GRAINS Li et al (2018a) combine a recursive VAE with object retrieval to iteratively generate a layout and objects. Shi et al (2019) also suggests an iterative approach based on a novel variational recursive autoencoder. Kulkarni et al (2019) on the other hand, create a 3D scene given a 2D image.…”
Section: D Object Context and Scene Layoutmentioning
confidence: 99%
“…In addition, PointFusion [43] introduced a novel framework, in which the image data and the raw point cloud data are independently processed by a CNN (Convolutional Neural Network) and a PointNet architecture respectively, followed by a fusion network combining their output results. Instead of utilizing both 2D and 3D information, [35] took 3D point data only, and utilized the geometric and hierarchical contextual information for 3D object detection. Recently, with only 3D input, VoteNet [28] introduced a deep learning-based Hough voting strategy for 3D object detection from point clouds.…”
Section: Related Workmentioning
confidence: 99%