Learning 3D Semantic Scene Graphs with Instance Embeddings

Wald, Johanna; Navab, Nassir; Tombari, Federico

doi:10.1007/s11263-021-01546-9

“…The model investigated here is combined with transformer for generating the scene graph by inputting video data and exploring the relationship between humans, the environment(places), and objects. As the novel proposed research, a scene graph mentioned the relationship between objects in the 3D environment [142]. It built the 3DSSG on top of 3RScan, aiming to extract the 3D geometry and depth information of 3D space.…”

Section: Graph-like Methodologiesmentioning

confidence: 99%

“…Rather than the 2D scene graph, the involvement of a diagram is more densely connected and informative in the 3D scene graph. As mentioned in the work [142] of scene graph generation, it allows spatialtemporal analysis of moving objects and human behaviors in a stereoscopic environment. D. RNNs: Gated Recurrent Units(GRUs) and LSTM GRU are utilized for predicting the time sequential movements with multiple variants, including GRU-SVM and GRU-D models.…”

Section: Graph-like Methodologiesmentioning

confidence: 99%

Deep Neural Networks in Video Human Action Recognition: A review

Wang¹,

Zheng²,

yang³

et al. 2023

Preprint

2

0

View full text Add to dashboard Cite

<p>Currently, spatial-temporal behavior recognition is one of the most foundational tasks of computer vision. The 2D neural networks of deep learning are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats, with the current increasingly wide usage of surveillance video and more tasks related to human action recognition. There are increasing tasks requiring temporal information for frames dependency analysis. The researchers have widely studied video-based recognition rather than image-based(pixel-based) only to extract more informative elements from geometry tasks. Our current related research addresses multiple novels proposed research works and compares their advantages and disadvantages between the derived deep learning frameworks rather than machine learning frameworks. The comparison happened between existing frameworks and datasets, which are video format data only. Due to the specific properties of human actions and the increasingly wide usage of deep neural networks, we collected all research works within the last three years between 2020 to 2022. In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks, especially video action recognition.</p>

show abstract

“…At present, it is a hot topic to consider the real dynamic situation in time and space dimensions. Ji et al [27] proposed to capture spatial and temporal information from video and Wald et al [28] presented a new neural network architecture of 3D data to explore the relationship between entities, and returned semantics from a given 3D scene through learning. Fernando et al [29] combined motion dynamics into the image and feed the image into any standard CNN for end-to-end learning.…”

Section: Dynamic Scenario Research In Time and Space Dimensionsmentioning

confidence: 99%

Image Semantic Segmentation Approach for Studying Human Behavior on Image Data

ZHENG,

CHEN,

HUANG

2024

Wuhan Univ. J. Nat. Sci.

0

View full text Add to dashboard Cite

Image semantic segmentation is an essential technique for studying human behavior through image data. This paper proposes an image semantic segmentation method for human behavior research. Firstly, an end-to-end convolutional neural network architecture is proposed, which consists of a depth-separable jump-connected fully convolutional network and a conditional random field network; then jump-connected convolution is used to classify each pixel in the image, and an image semantic segmentation method based on convolutional neural network is proposed; and then a conditional random field network is used to improve the effect of image segmentation of human behavior and a linear modeling and nonlinear modeling method based on the semantic segmentation of conditional random field image is proposed. Finally, using the proposed image segmentation network, the input entrepreneurial image data is semantically segmented to obtain the contour features of the person; and the segmentation of the images in the medical field. The experimental results show that the image semantic segmentation method is effective. It is a new way to use image data to study human behavior and can be extended to other research areas.

show abstract

“…The model investigated here is combined with a transformer for generating the scene graph by inputting video data and exploring the relationship between humans, the environment(places), and objects. As the novel proposed research, a scene graph mentioned the relationship between objects in the 3D environment [130]. It built the 3DSSG on top of 3RScan, aiming to extract the 3D geometry and depth information of 3D space.…”

Section: Multi-modality Behavior Recognitionmentioning

confidence: 99%

“…Rather than the 2D scene graph, the involvement of a diagram is more densely connected and informative in the 3D scene graph. As mentioned in the work [130] of scene graph generation, it allows video analysis of moving objects and human behaviors in a stereoscopic environment.…”

Section: Multi-modality Behavior Recognitionmentioning

confidence: 99%

Deep Neural Networks in Video Human Action Recognition: A review

Wang¹,

Zheng²,

yang³

et al. 2023

Preprint

View full text Add to dashboard Cite

<p>Currently, spatial-temporal behavior recognition is one of the most foundational tasks of computer vision. The 2D neural networks of deep learning are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats, with the current increasingly wide usage of surveillance video and more tasks related to human action recognition. There are increasing tasks requiring temporal information for frames dependency analysis. The researchers have widely studied video-based recognition rather than image-based(pixel-based) only to extract more informative elements from geometry tasks. Our current related research addresses multiple novels proposed research works and compares their advantages and disadvantages between the derived deep learning frameworks rather than machine learning frameworks. The comparison happened between existing frameworks and datasets, which are video format data only. Due to the specific properties of human actions and the increasingly wide usage of deep neural networks, we collected all research works within the last three years between 2020 to 2022. In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks, especially video action recognition.</p>

show abstract

Learning 3D Semantic Scene Graphs with Instance Embeddings

Cited by 12 publications

References 82 publications

Deep Neural Networks in Video Human Action Recognition: A review

Deep Neural Networks in Video Human Action Recognition: A review

Image Semantic Segmentation Approach for Studying Human Behavior on Image Data

Deep Neural Networks in Video Human Action Recognition: A review

Contact Info

Product

Resources

About