2022
DOI: 10.1007/s11063-022-10759-z
|View full text |Cite
|
Sign up to set email alerts
|

A New Attention-Based LSTM for Image Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(6 citation statements)
references
References 27 publications
0
5
0
Order By: Relevance
“…Initially, we train our captioning model by minimizing the cross-entropy loss of the output caption where the output token sequence length is restricted to 75 tokens LCapXE(θ)=t=1T log(pδ(y¯ty¯1:t1),where T denotes the number of words in a sentence; δ is the parameters in the model. In the second step, the reinforcement approach 36 is used to minimize the CIDEr score where we consider the reward in terms of the CIDEr score. We used Adamax optimizer 37 and a learning rate of 5×104 to train the model for minimizing the negative expected reward of randomly selected captions as the loss LCapRL(θ)=Ey1:Tspδ[γ(y1:Ts;y1:T*)],where the reward …”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Initially, we train our captioning model by minimizing the cross-entropy loss of the output caption where the output token sequence length is restricted to 75 tokens LCapXE(θ)=t=1T log(pδ(y¯ty¯1:t1),where T denotes the number of words in a sentence; δ is the parameters in the model. In the second step, the reinforcement approach 36 is used to minimize the CIDEr score where we consider the reward in terms of the CIDEr score. We used Adamax optimizer 37 and a learning rate of 5×104 to train the model for minimizing the negative expected reward of randomly selected captions as the loss LCapRL(θ)=Ey1:Tspδ[γ(y1:Ts;y1:T*)],where the reward …”
Section: Methodsmentioning
confidence: 99%
“…where T denotes the number of words in a sentence; δ is the parameters in the model. In the second step, the reinforcement approach 36 is used to minimize the CIDEr score where we consider the reward in terms of the CIDEr score. We used Adamax optimizer 37 and a learning rate of 5 × 10 −4 to train the model for minimizing the negative expected reward of randomly selected captions as the loss…”
Section: Evaluation Methods: Cross-entropy Loss and Cider-dmentioning
confidence: 99%
“…The development of neural network-based machine translation systems has been inspired by the encoder-decoder (ED) picture captioning model. This model utilizes a DL-based framework in which the decoder generates captions using the features extracted by the encoder from the input image [14]. Encoder-decoder (ED)-based image captioning models employ a DL-based framework.…”
Section: Related Workmentioning
confidence: 99%
“…• Additional uses: These days, deep learning is applied in nearly every industry. There are other further deep learning uses, including automated text creation [22], game play [23], and picture captioning [24].…”
Section: Deep Learningmentioning
confidence: 99%