Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1117
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Video Captioning with Video and Entailment Generation

Abstract: Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generation tasks: a temporally-directed unsupervised video prediction task to learn richer context-aware video encoder represe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
60
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 84 publications
(60 citation statements)
references
References 30 publications
(47 reference statements)
0
60
0
Order By: Relevance
“…Entailment NLI There has been a long history of NLI benchmarks focusing on linguistic entailment (Cooper et al, 1996;Dagan et al, 2006;Marelli et al, 2014;Bowman et al, 2015;Lai et al, 2017;Williams et al, 2018). Recent NLI datasets in particular have supported learning broadly-applicable sentence representations (Conneau et al, 2017); moreover, models trained on these datasets were used as components for performing better video captioning (Pasunuru and Bansal, 2017), summarization (Pasunuru and Bansal, 2018), and generation (Holtzman et al, 2018), confirming the importance of NLI research. The NLI task requires a variety of commonsense knowledge (LoBue and Yates, 2011), which our work complements.…”
Section: Related Workmentioning
confidence: 94%
“…Entailment NLI There has been a long history of NLI benchmarks focusing on linguistic entailment (Cooper et al, 1996;Dagan et al, 2006;Marelli et al, 2014;Bowman et al, 2015;Lai et al, 2017;Williams et al, 2018). Recent NLI datasets in particular have supported learning broadly-applicable sentence representations (Conneau et al, 2017); moreover, models trained on these datasets were used as components for performing better video captioning (Pasunuru and Bansal, 2017), summarization (Pasunuru and Bansal, 2018), and generation (Holtzman et al, 2018), confirming the importance of NLI research. The NLI task requires a variety of commonsense knowledge (LoBue and Yates, 2011), which our work complements.…”
Section: Related Workmentioning
confidence: 94%
“…It learns a shared feature with adequate expressive power to capture the useful information across the tasks. Multi-task learning has been successfully used in machine vision applications such as image classification [42], image segmentation [12], video captioning [49], and activity recognition [85]. A few works explore selfsupervised multi-task learning to learn high level visual features [15,55].…”
Section: Multi-task Learningmentioning
confidence: 99%
“…Luong et al (2016) showed improvements on translation, captioning, and parsing in a shared multi-task setting. Recently, Pasunuru and Bansal (2017) extend this idea to video captioning with two related tasks: video completion and entailment generation. We demonstrate that abstractive text summarization models can also be improved by sharing parameters with an entailment generation task.…”
Section: Related Workmentioning
confidence: 99%
“…For the task of entailment generation, we use the Standford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), where we only use the entailment-labeled pairs and regroup the splits to have a zero overlap traintest split and have a multi-reference test set, as suggested by Pasunuru and Bansal (2017). Out of 190, 113 entailments pairs, we use 145, 822 unique premise pairs for training, and the rest of them are equally divided into dev and test sets.…”
Section: Snli Corpusmentioning
confidence: 99%