2020
DOI: 10.3390/s20174761
|View full text |Cite
|
Sign up to set email alerts
|

Indoor Scene Change Captioning Based on Multimodality Data

Abstract: This study proposes a framework for describing a scene change using natural language text based on indoor scene observations conducted before and after a scene change. The recognition of scene changes plays an essential role in a variety of real-world applications, such as scene anomaly detection. Most scene understanding research has focused on static scenes. Most existing scene change captioning methods detect scene changes from single-view RGB images, neglecting the underlying three-dimensional structures. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 52 publications
0
5
0
Order By: Relevance
“…[19] tackle the change captioning problem in a 3D setting by assuming multi-view images are available both before and after the change. [20] further propose an end-to-end framework for describing scene changes from various input modalities, namely, RGB images, depth images, and point cloud data. Recently, [22] proposed a task to explicitly localise changes in the form of 3D bounding boxes from two point clouds and describe detailed scene changes for a fixed classes of objects.…”
Section: Dmentioning
confidence: 99%
“…[19] tackle the change captioning problem in a 3D setting by assuming multi-view images are available both before and after the change. [20] further propose an end-to-end framework for describing scene changes from various input modalities, namely, RGB images, depth images, and point cloud data. Recently, [22] proposed a task to explicitly localise changes in the form of 3D bounding boxes from two point clouds and describe detailed scene changes for a fixed classes of objects.…”
Section: Dmentioning
confidence: 99%
“…Then the images before and after the change as well as information capturing their differences are input into the decoder. Qiu et al 34,35 described changes based on multiview image information. Hosseinzadeh and Wang 36 formulated a training scheme that uses an auxiliary task to improve the training of the change captioning network.…”
Section: Related Workmentioning
confidence: 99%
“…Ref. [3] presents an end-to-end scene change understanding framework that observes the variation between two time-points using different types of input images (i.e., depth, RGB, and point cloud images). Meanwhile, Ref.…”
Section: Introductionmentioning
confidence: 99%