Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.158
|View full text |Cite
|
Sign up to set email alerts
|

Visually Grounded Continual Learning of Compositional Phrases

Abstract: Humans acquire language continually with much more limited access to data samples at a time, as compared to contemporary NLP systems. To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes. In the task, models are trained on a paired image-caption stream which has shifting object distribution; while being constantly evaluated by a visually-grounded mask… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…Is there a dataset that can prevent all shortcuts? Our automatic method for creating contrast sets allows us to ask those questions, while we believe that future work in better training mechanisms, as suggested in and Jin et al (2020), could help in making more robust models.…”
Section: Discussionmentioning
confidence: 99%
“…Is there a dataset that can prevent all shortcuts? Our automatic method for creating contrast sets allows us to ask those questions, while we believe that future work in better training mechanisms, as suggested in and Jin et al (2020), could help in making more robust models.…”
Section: Discussionmentioning
confidence: 99%
“…As with different characteristics of different computer vision problems, a simple adaptation of methods proposed for image classification may not lead to satisfactory performance in other computer vision problems. For example, in video grounding, Jin et al [218] pointed out that simple adaptation of ideas from image classification fails with this compositional phrases learning scenario of the language input. In visual question answering (VQA), Perez et al [219] pointed out that their model cannot preserve previously learned knowledge well after trained continuously on objects with different colors.…”
Section: Discussionmentioning
confidence: 99%
“…Cao et al (2021) propose a new Continual Learning framework for NMT models, while Ke et al (2021) proposes a novel capsule network based model called B-CL (Bert based Continual Learning) for sentiment classification tasks. Jin et al (2020) show how existing Continual Learning algorithms fail at learning compositional phrases. More recently Sun et al (2019) propose a lifelong learning method LAMOL that is capable of continually learning new tasks by replaying pseudo-samples of previous tasks that require no extra memory or model capacity.…”
Section: Related Workmentioning
confidence: 98%