Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1103
|View full text |Cite
|
Sign up to set email alerts
|

Reinforced Video Captioning with Entailment Rewards

Abstract: Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic metrics and human evaluation on multiple datasets. Next, we propose a novel entailment-enhanced reward (CIDEnt) that co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
69
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 107 publications
(69 citation statements)
references
References 32 publications
0
69
0
Order By: Relevance
“…Reinforcement Learning (RL) Loss Policy gradient methods can directly optimize discrete target evaluation metrics such as ROUGE that are non-differentiable (Paulus et al, 2018;Jaques et al, 2017;Pasunuru and Bansal, 2017;. At each time step, the word generated by the model can be viewed as an action taken by an RL agent.…”
Section: Mixed Objective Learningmentioning
confidence: 99%
“…Reinforcement Learning (RL) Loss Policy gradient methods can directly optimize discrete target evaluation metrics such as ROUGE that are non-differentiable (Paulus et al, 2018;Jaques et al, 2017;Pasunuru and Bansal, 2017;. At each time step, the word generated by the model can be viewed as an action taken by an RL agent.…”
Section: Mixed Objective Learningmentioning
confidence: 99%
“…Reinforcement learning has been applied to a wide array of text generation tasks, including machine translation (Wu et al, 2016;Ranzato et al, 2015), text summarization (Paulus et al, 2018;, and image/video captioning (Rennie et al, 2017;Pasunuru and Bansal, 2017). These RL approaches lean on the REINFORCE algorithm (Williams, 1992), or its variants, to train a generative model towards a non-differentiable reward by minimizing the policy gradient loss.…”
Section: Reinforcement Learning For Text Generationmentioning
confidence: 99%
“…(12) Due to instability of adversarial training, we additionally include a cross entropy (CE) loss that ensures that the generator will explore an output space in a more stable manner and maintain its language model [40]. The final objective of G θ is a mixed loss function, a weighted combination of Cross-Entropy Loss (L CE ) optimizing the maximumlikelihood training objective and Adversarial Loss (L GAN ) with its gradient function defined in Equation 12:…”
Section: A Implementation Detailsmentioning
confidence: 99%