Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1215
|View full text |Cite
|
Sign up to set email alerts
|

Talk2Car: Taking Control of Your Self-Driving Car

Abstract: A long-term goal of artificial intelligence is to have an agent execute commands communicated through natural language. In many cases the commands are grounded in a visual environment shared by the human who gives the command and the agent. Execution of the command then requires mapping the command into the physical visual space, after which the appropriate action can be taken. In this paper we consider the former. Or more specifically, we consider the problem in an autonomous driving setting, where a passenge… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
42
0
1

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2
1

Relationship

5
3

Authors

Journals

citations
Cited by 44 publications
(44 citation statements)
references
References 38 publications
0
42
0
1
Order By: Relevance
“…Another approach that does not rely on the use of region proposals is the work of Hudson and Manning [18]. Although this method was originally developed for visual question answering, [10] adapted it to tackle the visual grounding task. The model uses a recurrent MAC cell to match the natural language command with a global representation of the image.…”
Section: Methodsmentioning
confidence: 99%
“…Another approach that does not rely on the use of region proposals is the work of Hudson and Manning [18]. Although this method was originally developed for visual question answering, [10] adapted it to tackle the visual grounding task. The model uses a recurrent MAC cell to match the natural language command with a global representation of the image.…”
Section: Methodsmentioning
confidence: 99%
“…Research at the intersection of language and vision has been conducted extensively in the last few years. The main topics include image captioning (Karpathy and Fei-Fei 2015;Xu et al 2015), visual question answering (VQA) (Agrawal et al 2017;Andreas et al 2016), object referring expressions (Deruyttere et al 2019;Anne Hendricks et al 2017;Balajee Vasudevan et al 2018;Vasudevan et al 2018), and grounded language learning (Hermann et al 2017;Hill et al 2017). Although the goals are different from ours, some of the fundamental techniques are shared.…”
Section: Related Workmentioning
confidence: 99%
“…In fact, humans surely need to use this capability for many daily activities such as for driving -certain alerting stimuli, such as horns of cars and sirens of ambulances, police cars, fire trucks and human speech are meant to be heard, i.e. are primarily acoustic [4], [9], [10]. Auditory perception can be used to localize common objects like a running car, which is especially useful when visual perception fails due to adverse visual conditions or occlusions.…”
Section: Introductionmentioning
confidence: 99%