2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00430
|View full text |Cite
|
Sign up to set email alerts
|

IQA: Visual Question Answering in Interactive Environments

Abstract: We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: "Are there any apples in the fridge?" The agent must navigate around the scene, acquire visual understanding of scene elements, interact with objects (e.g. open refrigerators) and plan for a series of actions conditioned on the question. Popular reinforcement learning approaches with a single … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
295
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 301 publications
(300 citation statements)
references
References 59 publications
0
295
0
Order By: Relevance
“…Gandhi et al [19] collect a dataset of drone crashes and train self-supervised agents to avoid obstacles. A number of new challenging tasks have been proposed including instruction-based navigation [6,7], target-driven navigation [2,4], embodied/interactive question answering [1,9], and task planning [5].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Gandhi et al [19] collect a dataset of drone crashes and train self-supervised agents to avoid obstacles. A number of new challenging tasks have been proposed including instruction-based navigation [6,7], target-driven navigation [2,4], embodied/interactive question answering [1,9], and task planning [5].…”
Section: Related Workmentioning
confidence: 99%
“…We empirically show that point cloud representations are more effective for navigation in this task. Moreover, contrary to [1,9] that use synthetic environments, we extend the task to real environments sourced from [16]. 3D Representations and Architectures.…”
Section: Related Workmentioning
confidence: 99%
“…These approaches have been extended to the video domain as well [20,34,42]. Recently, [15,10] address the problem of question answering in an interactive environment. None of these approaches, however, is designed for leveraging external knowledge so they cannot handle the cases that the image does not represent the full knowledge to answer the question.…”
Section: Related Workmentioning
confidence: 99%
“…Several embodied or visual question answering datasets have been presented recently to address some of the problems of interest in our work, such as those of Brodeur et al (2017); Das et al (2017); Gordon et al (2017). In contrast with these, our purely text-based environment circumvents challenges inherent to modelling interactions between separate data modalities.…”
Section: Interactive Environmentsmentioning
confidence: 99%