“…There has been a recent surge of learning based methods [58,37,50,25,17,12,23,11,19] for indoor navigation tasks [2,5,18,49,15,51], propelled by the introduction of high quality simulators [52,45,32] and visually realistic environments [52,10]. Methods which use explicit task-dependent map representations [39,25,12,11,23,24,28,36] have shown to generalize better in unknown environments than end-to-end approaches with implicit world representations. For example, in [25] a differentiable mapper learns to predict top-down egocentric views of the scene from RGB images, which are then passed to a differentiable planner that predicts actions.…”