Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

† Equal contributionStandard computer vision systems assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself. We address the problem of learning to look around: how can an agent learn to acquire informative visual observations? We propose a reinforcement learning solution, where the agent is rewarded for reducing its uncertainty about the unobserved portions of its environment. Specifically, the agent is trained to select a short sequence of glimpses after which it must infer the appearance of its full environment.To address the challenge of sparse rewards, we further introduce sidekick policy learning, which exploits the asymmetry in observability between training and test time. The proposed methods learn observation policies that not only * This manuscript has been accepted for publication in Science Robotics. This version has not undergone final editing. Please refer to the complete version of record at https://robotics.sciencemag.org/ content/4/30/eaaw6326. The manuscript may not be reproduced or used in any manner that does not fall within the fair use provisions of the perform the completion task for which they are trained, but also generalize to exhibit useful "look-around" behavior for a range of active perception tasks.

show abstract

“…As a core solution to these challenges, we present a reinforcement learning (RL) approach for active observation completion (23). See Figure 2.…”

Section: Approach Overviewmentioning

confidence: 99%

“…wherex t (X, θ (i) ) denotes the reconstructed view at viewpoint θ (i) and time t, d denotes the pixelwise reconstruction MSE, and ∆ 0 denotes the offset to account for the unknown starting azimuth (23).…”

Section: Policy Learning Formulationmentioning

confidence: 99%

Emergence of exploratory look-around behaviors through active observation completion

2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…Decoder: To learn a representation with this property, the output of the encoder is processed through another fully connected layer to increase its dimensionality The complete architecture, together with more detailed specifications, is visualized in Fig 2. Our convolutional encoder-decoder [43] neural network architecture is similar to [30,58,66,69]. As discussed above, however, the primary focus of our work is very different.…”

Section: Network Architecture and Trainingmentioning

confidence: 99%

“…How then can it know the correct viewpoint coordinates for the viewgrid it must produce? It instead produces viewgrids aligned with the observed viewpoint at the azimuthal coordinate origin, similar to [30]. Azimuthal rotations of a given viewgrid all form an equivalence class.…”

Section: Network Architecture and Trainingmentioning

confidence: 99%

ShapeCodes: Self-supervised Feature Learning by Lifting Views to Viewgrids

Jayaraman

Gao

Grauman

2018

Computer Vision – ECCV 2018

Self Cite

View full text Add to dashboard Cite

We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. The main idea is a self-supervised training objective that, given only a single 2D image, requires all unseen views of the object to be predictable from learned features. We implement this idea as an encoderdecoder convolutional neural network. The network maps an input image of an unknown category and unknown viewpoint to a latent space, from which a deconvolutional decoder can best "lift" the image to its complete viewgrid showing the object from all viewing angles. Our class-agnostic training procedure encourages the representation to capture fundamental shape primitives and semantic regularities in a data-driven mannerwithout manual semantic labels. Our results on two widely-used shape datasets show 1) our approach successfully learns to perform "mental rotation" even for objects unseen during training, and 2) the learned latent space is a powerful representation for object recognition, outperforming several existing unsupervised feature learning methods.

show abstract

“…The influential work of [25] learns a policy for visual attention in image classification. Active perception systems use RNNs and/or reinforcement learning to select places to look in a novel image [26,27], environment [28,29,30], or video [31,32,33,34] to detect certain objects or activities efficiently. Broadly construed, we share the general goal of efficiently converging on a desired target "view", but our problem domain is entirely different.…”

Section: Related Workmentioning

confidence: 99%

Snap Angle Prediction for 360$$^{\circ }$$ Panoramas

Xiong

Grauman

2018

Computer Vision – ECCV 2018

Self Cite

View full text Add to dashboard Cite

360 • panoramas are a rich medium, yet notoriously difficult to visualize in the 2D image plane. We explore how intelligent rotations of a spherical image may enable content-aware projection with fewer perceptible distortions. Whereas existing approaches assume the viewpoint is fixed, intuitively some viewing angles within the sphere preserve high-level objects better than others. To discover the relationship between these optimal snap angles and the spherical panorama's content, we develop a reinforcement learning approach for the cubemap projection model. Implemented as a deep recurrent neural network, our method selects a sequence of rotation actions and receives reward for avoiding cube boundaries that overlap with important foreground objects. We show our approach creates more visually pleasing panoramas while using 5x less computation than the baseline.

show abstract

Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

Cited by 77 publications

References 61 publications

Emergence of exploratory look-around behaviors through active observation completion

Emergence of exploratory look-around behaviors through active observation completion

ShapeCodes: Self-supervised Feature Learning by Lifting Views to Viewgrids

Snap Angle Prediction for 360$$^{\circ }$$ Panoramas

Contact Info

Product

Resources

About