Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/338
|View full text |Cite
|
Sign up to set email alerts
|

Embodied Multimodal Multitask Learning

Abstract: Visually-grounded embodied language learning models have recently shown to be effective at learning multiple multimodal tasks such as following navigational instructions and answering questions. In this paper, we address two key limitations of these models, (a) the inability to transfer the grounded knowledge across different tasks and (b) the inability to transfer to new words and concepts not seen during training using only a few examples. We propose a multitask model which facilitates knowledge tran… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 21 publications
0
15
0
Order By: Relevance
“…Otherwise, when max opt len=1, agents with memory or attention do generalize well in both Random Split and our Dynamic Test; see detailed results in Appendix G.2. Perhaps the notion of affordance seems a bit abstract in HALMA and can be more intuitive in visual semantic navigation and control (Yang et al, 2019;Chaplot et al, 2020). We hope our work can inspire the future development of benchmarks for these topics.…”
Section: Related Workmentioning
confidence: 94%
“…Otherwise, when max opt len=1, agents with memory or attention do generalize well in both Random Split and our Dynamic Test; see detailed results in Appendix G.2. Perhaps the notion of affordance seems a bit abstract in HALMA and can be more intuitive in visual semantic navigation and control (Yang et al, 2019;Chaplot et al, 2020). We hope our work can inspire the future development of benchmarks for these topics.…”
Section: Related Workmentioning
confidence: 94%
“…Sharing knowledge between multiple tasks can be achieved in a multi-task learning setup [14,62] where all tasks are learned jointly in a supervised manner, or via meta-RL [27,64] where a meta policy learned from a distribution of tasks is finetuned on the target. Unlike these methods, our policy is learned from one task that does not require manual annotations, and it can be transferred in a zero-shot setup where the policy does not receive any interactive training on the target.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, [33,93] add skip connections so that signals from higher-level tasks are amplified. [11] learns the task of semantic goal navigation at a lower level and learns the task of embodied question answering at a higher level.…”
Section: Vanillamentioning
confidence: 99%
“…[80] incorporates a human cognitive process, the gaze behavior while reading, into a sentiment classification model by adding a gaze prediction task and obtains improved performance. [11] builds a semantic goal navigation system where agents could respond to natural language navigation commands. In this system, a one-to-one mapping between visual feature maps and text tokens is established through a dualattention mechanism and the visual question answering and object detection tasks are added to enforce such an alignment.…”
Section: Multimodal Mtlmentioning
confidence: 99%