Multi-Task Video Captioning with Video and Entailment Generation

Pasunuru, Ramakanth; Bansal, Mohit

doi:10.18653/v1/p17-1117

Cited by 84 publications

(60 citation statements)

References 30 publications

(47 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Entailment NLI There has been a long history of NLI benchmarks focusing on linguistic entailment (Cooper et al, 1996;Dagan et al, 2006;Marelli et al, 2014;Bowman et al, 2015;Lai et al, 2017;Williams et al, 2018). Recent NLI datasets in particular have supported learning broadly-applicable sentence representations (Conneau et al, 2017); moreover, models trained on these datasets were used as components for performing better video captioning (Pasunuru and Bansal, 2017), summarization (Pasunuru and Bansal, 2018), and generation (Holtzman et al, 2018), confirming the importance of NLI research. The NLI task requires a variety of commonsense knowledge (LoBue and Yates, 2011), which our work complements.…”

Section: Related Workmentioning

confidence: 94%

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

Zellers

Bisk

Schwartz

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

499

474

View full text Add to dashboard Cite

Given a partial description like "she opened the hood of the car," humans can reason about the situation and anticipate what might come next ("then, she examined the engine"). In this paper, we introduce the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning.We present Swag, a new dataset with 113k multiple choice questions about a rich spectrum of grounded situations. To address the recurring challenges of the annotation artifacts and human biases found in many existing datasets, we propose Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data. To account for the aggressive adversarial filtering, we use state-of-theart language models to massively oversample a diverse set of potential counterfactuals. Empirical results demonstrate that while humans can solve the resulting inference problems with high accuracy (88%), various competitive models struggle on our task. We provide comprehensive analysis that indicates significant opportunities for future research.

show abstract

Section: Related Workmentioning

confidence: 94%

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

Zellers

Bisk

Schwartz

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

499

474

View full text Add to dashboard Cite

show abstract

“…It learns a shared feature with adequate expressive power to capture the useful information across the tasks. Multi-task learning has been successfully used in machine vision applications such as image classification [42], image segmentation [12], video captioning [49], and activity recognition [85]. A few works explore selfsupervised multi-task learning to learn high level visual features [15,55].…”

Section: Multi-task Learningmentioning

confidence: 99%

Unsupervised Multi-Task Feature Learning on Point Clouds

Hassani

Haley

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

175

117

View full text Add to dashboard Cite

We introduce an unsupervised multi-task model to jointly learn point and shape features on point clouds. We define three unsupervised tasks including clustering, reconstruction, and self-supervised classification to train a multi-scale graph-based encoder. We evaluate our model on shape classification and segmentation benchmarks. The results suggest that it outperforms prior state-of-the-art unsupervised models: In the ModelNet40 classification task, it achieves an accuracy of 89.1% and in ShapeNet segmentation task, it achieves an mIoU of 68.2 and accuracy of 88.6%.

show abstract

“…Luong et al (2016) showed improvements on translation, captioning, and parsing in a shared multi-task setting. Recently, Pasunuru and Bansal (2017) extend this idea to video captioning with two related tasks: video completion and entailment generation. We demonstrate that abstractive text summarization models can also be improved by sharing parameters with an entailment generation task.…”

Section: Related Workmentioning

confidence: 99%

“…For the task of entailment generation, we use the Standford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), where we only use the entailment-labeled pairs and regroup the splits to have a zero overlap traintest split and have a multi-reference test set, as suggested by Pasunuru and Bansal (2017). Out of 190, 113 entailments pairs, we use 145, 822 unique premise pairs for training, and the rest of them are equally divided into dev and test sets.…”

Section: Snli Corpusmentioning

confidence: 99%

Towards Improving Abstractive Summarization via Entailment Generation

Pasunuru

Guo

Bansal

2017

Proceedings of the Workshop on New Frontiers in Summarization

Self Cite

View full text Add to dashboard Cite

Abstractive summarization, the task of rewriting and compressing a document into a short summary, has achieved considerable success with neural sequence-tosequence models. However, these models can still benefit from stronger natural language inference skills, since a correct summary is logically entailed by the input document, i.e., it should not contain any contradictory or unrelated information. We incorporate such knowledge into an abstractive summarization model via multi-task learning, where we share its decoder parameters with those of an entailment generation model. We achieve promising initial improvements based on multiple metrics and datasets (including a test-only setting). The domain mismatch between the entailment (captions) and summarization (news) datasets suggests that the model is learning some domain-agnostic inference skills.

show abstract

Multi-Task Video Captioning with Video and Entailment Generation

Cited by 84 publications

References 30 publications

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

Unsupervised Multi-Task Feature Learning on Point Clouds

Towards Improving Abstractive Summarization via Entailment Generation

Contact Info

Product

Resources

About