Abstract:Features associated with an object or its surfaces in natural scenes tend to vary coherently in space and time. In psychological literature, these coherent covariations have been considered to be important for neural systems to acquire models of objects and object categories. From a statistical inference perspective, such coherent covariation can provide a mechanism to learn the statistical priors in natural scenes that are useful for probabilistic inference. In this article, we present some neurophysiological experimental observations in the early visual cortex that provide insights on how correlation structures in visual scenes are being encoded by neuronal tuning and connections among neurons. The key insight is that correlated structures in visual scenes result in correlated neuronal activities, which shapes the tuning properties of individual neurons and the connections between them, embedding Gestalt-related computational constraints or priors for surface inference. Extending these concepts to the inferotemporal cortex suggests a representational framework that is distinct from traditional feed-forward hierarchy of invariant object representation and recognition. In this framework, lateral connections among view-based neurons, learned from the temporal association of the object views observed over time, can form a linked graph structure with local dependency, akin to a dense aspect graph in computer vision. This web-like graph allows view-invariant object representation to be created using sparse feed-forward connections, while maintaining the explicit representation of the different views. Thus, it can serve as an effective prior model for generating predictions of future incoming views to facilitate object inference.