Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex . Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space and is critical for integrating self-motion (path integration) and planning direct trajectories to goals (vector-based navigation). Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types . We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments-optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments.
Scene representation-the process of converting visual sensory data into concise descriptions-is a requirement for intelligent behavior. Recent work has shown that neural networks excel at this task when provided with large, labeled datasets. However, removing the reliance on human labeling remains an important open problem. To this end, we introduce the Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them.
We present a model for early vision tasks such as denoising, super-resolution, deblurring, and demosaicing. The model provides a resolution-independent representation of discrete images which admits a truly rotationally invariant prior. The model generalizes several existing approaches: variational methods, finite element methods, and discrete random fields. The primary contribution is a novel energy functional which has not previously been written down, which combines the discrete measurements from pixels with a continuous-domain world viewed through continousdomain point-spread functions. The value of the functional is that simple priors (such as total variation and generalizations) on the continous-domain world become realistic priors on the sampled images. We show that despite its apparent complexity, optimization of this model depends on just a few computational primitives, which although tedious to derive, can now be reused in many domains. We define a set of optimization algorithms which greatly overcome the apparent complexity of this model, and make possible its practical application. New experimental results include infinite-resolution upsampling, and a method for obtaining "subpixel superpixels".
In this paper, a system which constructs a mosaic image of the tunnel surface with little distortion is presented. The tunnel surface is typically composed of a roughly cylindrical surface and protuberant regions containing objects such as pipes, pans and tunnel ridges. Since the true surface is neither planar nor quadric, existing mosaicing methods, which assume either homography or quadratic motion models, suffer from distortion. The proposed system obtains a sparse 3D model of the tunnel by multi-view reconstruction. Then, the Support Vector Machine (SVM) classifier is applied in order to separate image features lying on the cylindrical surface from those of the non-surface. The reconstructed 3D points are reprojected into images to retrieve the priors given by the SVM classifier for accurate cylindrical surface estimation. The final mosaic image is obtained by flattening the estimated textured surface onto a plane. The results suggest that the mosaic quality depends critically on the surface estimation accuracy and the proposed system is able to produce the mosaic image that preserves all physical sense, e.g. line parallelism and straightness, which is important for tunnel inspection.
Figure 1: An example of our new Roto++ tool working with a professional artist to increase productivity. The artist has already specified a number of keyframes but is not satisfied with one of the intermediate frames. Under standard baselines, correcting the erroneous curve requires moving the individual control points of the spline. Using our new shape model we are able to provide an Intelligent Drag Tool that can generate likely shapes given the other keyframes. In our new interaction, the user simply selects the incorrect points and drags them all towards the correct shape. Our shape model then correctly proposes the new control point locations, allowing the correction to be performed in a single operation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.