Embodied Multimodal Multitask Learning

Chaplot, Devendra Singh; Lee, Lisa; Salakhutdinov, Ruslan; Parikh, Devi; Batra, Dhruv

doi:10.24963/ijcai.2020/338

Cited by 16 publications

(15 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Otherwise, when max opt len=1, agents with memory or attention do generalize well in both Random Split and our Dynamic Test; see detailed results in Appendix G.2. Perhaps the notion of affordance seems a bit abstract in HALMA and can be more intuitive in visual semantic navigation and control (Yang et al, 2019;Chaplot et al, 2020). We hope our work can inspire the future development of benchmarks for these topics.…”

Section: Related Workmentioning

confidence: 94%

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

Xie¹,

Ma²,

Yu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Humans learn compositional and causal abstraction, i.e., knowledge, in response to the structure of naturalistic tasks. When presented with a problem-solving task involving some objects, toddlers would first interact with these objects to reckon what they are and what can be done with them. Leveraging these concepts, they could understand the internal structure of this task, without seeing all of the problem instances. Remarkably, they further build cognitively executable strategies to rapidly solve novel problems. To empower a learning agent with similar capability, we argue there shall be three levels of generalization in how an agent represents its knowledge: perceptual, conceptual, and algorithmic. In this paper, we devise the very first systematic benchmark that offers joint evaluation covering all three levels. This benchmark is centered around a novel task domain, HALMA, for visual concept development and rapid problem solving. Uniquely, HALMA has a minimum yet complete concept space, upon which we introduce a novel paradigm to rigorously diagnose and dissect learning agents' capability in understanding and generalizing complex and structural concepts. We conduct extensive experiments on reinforcement learning agents with various inductive biases and carefully report their proficiency and weakness. 1

show abstract

Section: Related Workmentioning

confidence: 94%

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

Xie¹,

Ma²,

Yu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Sharing knowledge between multiple tasks can be achieved in a multi-task learning setup [14,62] where all tasks are learned jointly in a supervised manner, or via meta-RL [27,64] where a meta policy learned from a distribution of tasks is finetuned on the target. Unlike these methods, our policy is learned from one task that does not require manual annotations, and it can be transferred in a zero-shot setup where the policy does not receive any interactive training on the target.…”

Section: Related Workmentioning

confidence: 99%

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Al-Halah¹,

Ramakrishnan²,

Grauman³

2022

Preprint

View full text Add to dashboard Cite

In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated whenever there is a change in the task type or the goal modality. We present a unified approach to visual navigation using a novel modular transfer learning model. Our model can effectively leverage its experience from one source task and apply it to multiple target tasks (e.g., ObjectNav, RoomNav, ViewNav) with various goal modalities (e.g., image, sketch, audio, label). Furthermore, our model enables zero-shot experience learning, whereby it can solve the target tasks without receiving any task-specific interactive training. Our experiments on multiple photorealistic datasets and challenging tasks show that our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.

show abstract

“…In addition, [33,93] add skip connections so that signals from higher-level tasks are amplified. [11] learns the task of semantic goal navigation at a lower level and learns the task of embodied question answering at a higher level.…”

Section: Vanillamentioning

confidence: 99%

“…[80] incorporates a human cognitive process, the gaze behavior while reading, into a sentiment classification model by adding a gaze prediction task and obtains improved performance. [11] builds a semantic goal navigation system where agents could respond to natural language navigation commands. In this system, a one-to-one mapping between visual feature maps and text tokens is established through a dualattention mechanism and the visual question answering and object detection tasks are added to enforce such an alignment.…”

Section: Multimodal Mtlmentioning

confidence: 99%

Multi-Task Learning in Natural Language Processing: An Overview

Chen¹,

Qiang²

2021

Preprint

View full text Add to dashboard Cite

Deep learning approaches have achieved great success in the field of Natural Language Processing (NLP). However, deep neural models often suffer from overfitting and data scarcity problems that are pervasive in NLP tasks. In recent years, Multi-Task Learning (MTL), which can leverage useful information of related tasks to achieve simultaneous performance improvement on multiple related tasks, has been used to handle these problems. In this paper, we give an overview of the use of MTL in NLP tasks. We first review MTL architectures used in NLP tasks and categorize them into four classes, including the parallel architecture, hierarchical architecture, modular architecture, and generative adversarial architecture. Then we present optimization techniques on loss construction, data sampling, and task scheduling to properly train a multi-task model. After presenting applications of MTL in a variety of NLP tasks, we introduce some benchmark datasets. Finally, we make a conclusion and discuss several possible research directions in this field.

show abstract

Embodied Multimodal Multitask Learning

Cited by 16 publications

References 21 publications

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Multi-Task Learning in Natural Language Processing: An Overview

Contact Info

Product

Resources

About