2019
DOI: 10.3390/rs11060612
|View full text |Cite
|
Sign up to set email alerts
|

Description Generation for Remote Sensing Images Using Attribute Attention Mechanism

Abstract: Image captioning generates a semantic description of an image. It deals with image understanding and text mining, which has made great progress in recent years. However, it is still a great challenge to bridge the “semantic gap” between low-level features and high-level semantics in remote sensing images, in spite of the improvement of image resolutions. In this paper, we present a new model with an attribute attention mechanism for the description generation of remote sensing images. Therefore, we have explor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
55
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 91 publications
(55 citation statements)
references
References 30 publications
0
55
0
Order By: Relevance
“…In addition, Wang et al [8] proposed a novel method which used Latent semantic embedding by metric learning for remote sensing image multi-sentence captioning task. Zhang et al [9] propose a new model with an attribute attention mechanism for remote sensing image captioning. At the same time, it is explored whether the attributes extracted from remote sensing images have an impact on the attention mechanism.…”
Section: Remote Sensing Image Captioningmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, Wang et al [8] proposed a novel method which used Latent semantic embedding by metric learning for remote sensing image multi-sentence captioning task. Zhang et al [9] propose a new model with an attribute attention mechanism for remote sensing image captioning. At the same time, it is explored whether the attributes extracted from remote sensing images have an impact on the attention mechanism.…”
Section: Remote Sensing Image Captioningmentioning
confidence: 99%
“…In remote sensing image captioning [4][5][6][7][8][9], the above principle shows that the accurate recognition of remote sensing images often requires the spatial relationships between objects. Especially in the cases of different objects with the same spectrum, different types of remote sensing object may have the same spectral, texture and shape features and can only be more accurately interpreted by using adjacent objects and their spatial relations.…”
Section: Introductionmentioning
confidence: 99%
“…This detailed information is helpful for generating detailed description of objects or relationships in images. Later on, X. Zhang et al [19] added an attention mechanism into the remote sensing image caption models. In addition, the high-level image features were also used as the attributes.…”
Section: Introductionmentioning
confidence: 99%
“…The bidirectional grid LSTM took visual features of an image as the input and learned complex spatial patterns based on two-dimensional context.In recent years, the application of reinforcement learning [20][21][22][23] in image caption has also been a hot topic, which adjusts the generation strategies using the change of the reward functions in the caption generation process to dynamic vocabulary generation.However, most of the current studies focus on the scene semantic description of ordinary digital images [24,25]. To use the deep RNN or LSTM to execute the semantic analysis [26][27][28][29][30] of remote sensing objects, the following problems must be solved:Location ambiguity: At different time steps, the attention mechanism is based on 14 × 14-sized image features and corresponds to 196 spatial locations in remote sensing images. There are some deviations [31], however, that limit the application in remote sensing object recognition.Boundary ambiguity: the nouns (label of objects) in captions cannot accurately segment the boundaries of remote sensing objects in an image; thus, it is impossible to identify the spatial relationship between the objects.…”
mentioning
confidence: 99%
“…However, most of the current studies focus on the scene semantic description of ordinary digital images [24,25]. To use the deep RNN or LSTM to execute the semantic analysis [26][27][28][29][30] of remote sensing objects, the following problems must be solved:…”
mentioning
confidence: 99%