2019
DOI: 10.1007/978-3-030-11018-5_12
|View full text |Cite
|
Sign up to set email alerts
|

Distinctive-Attribute Extraction for Image Captioning

Abstract: Image captioning, an open research issue, has been evolved with the progress of deep neural networks. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are employed to compute image features and generate natural language descriptions in the research. In previous works, a caption involving semantic description can be generated by applying additional information into the RNNs. In this approach, we propose a distinctive-attribute extraction (DaE) which explicitly encourages significant mea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 24 publications
(71 reference statements)
0
2
0
Order By: Relevance
“…where f w is a function that calculates the weight value allocated to w i , x i is the embedding vector of w i , and s t represents the word context vector at time t. Note that the δ t k stays the same during the generation of each word until the last time step. Here, inspired by the previous works [16] [9], we use TF-IDF method as the function f w , as this method can measure the importance degree of each word in a sentence or document. The word context vector s t is then fused with the previous hidden state h t −1 of the LSTM decoder to combine more compact semantic information to guide the visual attention, calculated as follows:…”
Section: Conceptnetmentioning
confidence: 99%
“…where f w is a function that calculates the weight value allocated to w i , x i is the embedding vector of w i , and s t represents the word context vector at time t. Note that the δ t k stays the same during the generation of each word until the last time step. Here, inspired by the previous works [16] [9], we use TF-IDF method as the function f w , as this method can measure the importance degree of each word in a sentence or document. The word context vector s t is then fused with the previous hidden state h t −1 of the LSTM decoder to combine more compact semantic information to guide the visual attention, calculated as follows:…”
Section: Conceptnetmentioning
confidence: 99%
“…where f w is a function that calculates the weight value allocated to w i , x i is the embedding vector of w i , and s t represents the word context vector at time t. Note that the tk stays the same during the generation of each word until the last time step. Here, inspired by the previous works (Kim et al 2018;Park et al 2017), we use TF-IDF method as the function f w , as this method can measure the importance degree of each word in a sentence or document. The word context vector s t is then fused with the previous hidden state h t−1 of the LSTM decoder to combine more compact semantic information to guide the visual attention, calculated as follows:…”
Section: Implementation Of Word Attentionmentioning
confidence: 99%