Simultaneous Mapping and Target Driven Navigation

Georgakis, Georgios; Li, Yimeng; Kosecka, Jana

doi:10.48550/arxiv.1911.07980

Cited by 9 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Learning based navigation methods. There has been a recent surge of learning based methods [58,37,50,25,17,12,23,11,19] for indoor navigation tasks [2,5,18,49,15,51], propelled by the introduction of high quality simulators [52,45,32] and visually realistic environments [52,10]. Methods which use explicit task-dependent map representations [39,25,12,11,23,24,28,36] have shown to generalize better in unknown environments than end-to-end approaches with implicit world representations.…”

Section: Related Workmentioning

confidence: 99%

“…There has been a recent surge of learning based methods [58,37,50,25,17,12,23,11,19] for indoor navigation tasks [2,5,18,49,15,51], propelled by the introduction of high quality simulators [52,45,32] and visually realistic environments [52,10]. Methods which use explicit task-dependent map representations [39,25,12,11,23,24,28,36] have shown to generalize better in unknown environments than end-to-end approaches with implicit world representations. For example, in [25] a differentiable mapper learns to predict top-down egocentric views of the scene from RGB images, which are then passed to a differentiable planner that predicts actions.…”

Section: Related Workmentioning

confidence: 99%

“…These methods do not have an explicit representation of the environment and tend to suffer from poor generalization. To remedy this issue, most current methods learn a map representation that enables the encoding of prior information about the geometry and semantics of a scene, acting as an episodic memory [12,11,25,23]. However, maps created by these methods are restricted to contain information only from areas that the agent has directly observed.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning to Map for Active Semantic Goal Navigation

Georgakis¹,

Bucher²,

Schmeckpeper³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We consider the problem of object goal navigation in unseen environments. In our view, solving this problem requires learning of contextual semantic priors, a challenging endeavour given the spatial and semantic variability of indoor environments. Current methods learn to implicitly encode these priors through goal-oriented navigation policy functions operating on spatial representations that are limited to the agent's observable areas. In this work, we propose a novel framework that actively learns to generate semantic maps outside the field of view of the agent and leverages the uncertainty over the semantic classes in the unobserved areas to decide on long term goals. We demonstrate that through this spatial prediction strategy, we are able to learn semantic priors in scenes that can be leveraged in unknown environments. Additionally, we show how different objectives can be defined by balancing exploration with exploitation during searching for semantic targets. Our method is validated in the visually realistic environments offered by the Matterport3D dataset and show state of the art results on the object goal navigation task.* Denotes equal contribution.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Map for Active Semantic Goal Navigation

Georgakis¹,

Bucher²,

Schmeckpeper³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…limited number of steps) before the start of navigation [17]. In the latter case, the agent builds the map as it navigates an unseen test environment [57,58,44], which makes it more tightly integrated with the downstream task. In this section, we build upon existing visual exploration survey papers [48,47] to include more recent works and directions.…”

Section: Visual Explorationmentioning

confidence: 99%

A Survey of Embodied AI: From Simulators to Research Tasks

Duan¹,

Jian²,

Tan³

et al. 2021

Preprint

View full text Add to dashboard Cite

There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", whereby AI algorithms and agents no longer simply learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through embodied physical interactions with their environments, whether real or simulated. Consequently, there has been substantial growth in the demand for embodied AI simulators to support a diversity of embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of artificial general intelligence, but there is no contemporary and comprehensive survey of this field. This paper comprehensively surveys state-of-the-art embodied AI simulators and research, mapping connections between these. By benchmarking nine state-of-the-art embodied AI simulators in terms of seven features, this paper aims to understand the simulators in their provision for use in embodied AI research. Finally, based upon the simulators and a pyramidal hierarchy of embodied AI research tasks, this paper surveys the main research tasks in embodied AIvisual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation and datasets.

show abstract

“…Recent works have demonstrated that utilizing perceptual priors, via powerful computer vision models, reduces sample complexity, enables generalizability across environments, and largely increases performance in visuomotor tasks [4]- [7]. Inspired by these methods, we make the observation that visual motion is a strong cue for objectness [8] and propose a novel object-centric video predictive model that leverages state-of-the-art perception in the form of object instance segmentation and optical flow, and does not require object annotations.…”

Section: Introductionmentioning

confidence: 99%

Object-centric Video Prediction without Annotation

Schmeckpeper¹,

Georgakis²,

Daniilidis³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

In order to interact with the world, agents must be able to predict the results of the world's dynamics. A natural approach to learn about these dynamics is through video prediction, as cameras are ubiquitous and powerful sensors. Direct pixel-to-pixel video prediction is difficult, does not take advantage of known priors, and does not provide an easy interface to utilize the learned dynamics. Object-centric video prediction offers a solution to these problems by taking advantage of the simple prior that the world is made of objects and by providing a more natural interface for control. However, existing object-centric video prediction pipelines require dense object annotations in training video sequences. In this work, we present Object-centric Prediction without Annotation (OPA), an object-centric video prediction method that takes advantage of priors from powerful computer vision models. We validate our method on a dataset comprised of video sequences of stacked objects falling, and demonstrate how to adapt a perception model in an environment through end-to-end video prediction training.

show abstract

Simultaneous Mapping and Target Driven Navigation

Cited by 9 publications

References 24 publications

Learning to Map for Active Semantic Goal Navigation

Learning to Map for Active Semantic Goal Navigation

A Survey of Embodied AI: From Simulators to Research Tasks

Object-centric Video Prediction without Annotation

Contact Info

Product

Resources

About