Standard models of the visual object recognition pathway hold that a largely feedforward process from the retina through inferotemporal cortex leads to object identification. A subsequent feedback process originating in frontoparietal areas through reciprocal connections to striate cortex provides attentional support to salient or behaviorally-relevant features. Here, we review mounting evidence that feedback signals also originate within extrastriate regions and begin during the initial feedforward process. This feedback process is temporally dissociable from attention and provides important functions such as grouping, associational reinforcement, and filling-in of features. Local feedback signals operating concurrently with feedforward processing are important for object identification in noisy real-world situations, particularly when objects are partially occluded, unclear, or otherwise ambiguous. Altogether, the dissociation of early and late feedback processes presented here expands on current models of object identification, and suggests a dual role for descending feedback projections.
† Randall C. O'Reilly and Dean Wyatte have contributed equally to this work.How does the brain learn to recognize objects visually, and perform this difficult feat robustly in the face of many sources of ambiguity and variability? We present a computational model based on the biology of the relevant visual pathways that learns to reliably recognize 100 different object categories in the face of naturally occurring variability in location, rotation, size, and lighting. The model exhibits robustness to highly ambiguous, partially occluded inputs. Both the unified, biologically plausible learning mechanism and the robustness to occlusion derive from the role that recurrent connectivity and recurrent processing mechanisms play in the model. Furthermore, this interaction of recurrent connectivity and learning predicts that high-level visual representations should be shaped by error signals from nearby, associated brain areas over the course of visual learning. Consistent with this prediction, we show how semantic knowledge about object categories changes the nature of their learned visual representations, as well as how this representational shift supports the mapping between perceptual and conceptual knowledge. Altogether, these findings support the potential importance of ongoing recurrent processing throughout the brain's visual system and suggest ways in which object recognition can be understood in terms of interactions within and between processes over time. Keywords: object recognition, computational model, recurrent processing, feedback, winners-take-all mechanism INTRODUCTIONOne of the most salient features of the mammalian neocortex is the structure of its connectivity, which provides for many forms of recurrent processing, where neurons mutually influence each other through direct, bidirectional interactions. There are extensive bidirectional excitatory and inhibitory connections within individual cortical areas, and almost invariably, every area that receives afferent synapses from another area, also sends back efferent synapses in return (Felleman and Van Essen, 1991;Scannell et al., 1995;Sporns and Zwi, 2004;Sporns et al., 2007). We describe an explicit computational model (LVis -Leabra Vision) of the function of this recurrent architecture in the context of visual object recognition, demonstrating a synergy between the learning and processing benefits of recurrent connectivity.Recurrent processing, for example, has been suggested to be critical for solving certain visual tasks such as figure-ground segmentation (Hupe et al., 1998;Roelfsema et al., 2002;Lamme and Roelfsema, 2000), which requires integration of information from outside the classical receptive field. We demonstrate how recurrent excitatory processing could provide a similar function in visual occlusion, which requires the organization of image fragments that span multiple receptive fields into a logical whole Gestalt and involves the filling-in of missing visual information Lerner et al., 2002;Rauschenberger et al., 2006;Weigelt et al., 2...
Abstract■ Everyday vision requires robustness to a myriad of environmental factors that degrade stimuli. Foreground clutter can occlude objects of interest, and complex lighting and shadows can decrease the contrast of items. How does the brain recognize visual objects despite these low-quality inputs? On the basis of predictions from a model of object recognition that contains excitatory feedback, we hypothesized that recurrent processing would promote robust recognition when objects were degraded by strengthening bottom-up signals that were weakened because of occlusion and contrast reduction. To test this hypothesis, we used backward masking to interrupt the processing of partially occluded and contrast reduced images during a categorization experiment. As predicted by the model, we found significant interactions between the mask and occlusion and the mask and contrast, such that the recognition of heavily degraded stimuli was differentially impaired by masking. The model provided a close fit of these results in an isomorphic version of the experiment with identical stimuli. The model also provided an intuitive explanation of the interactions between the mask and degradations, indicating that masking interfered specifically with the extensive recurrent processing necessary to amplify and resolve highly degraded inputs, whereas less degraded inputs did not require much amplification and could be rapidly resolved, making them less susceptible to masking. Together, the results of the experiment and the accompanying model simulations illustrate the limits of feedforward vision and suggest that object recognition is better characterized as a highly interactive, dynamic process that depends on the coordination of multiple brain areas. ■
How does the brain bind together visual features that are processed concurrently by different neurons into a unified percept suitable for processes such as object recognition? Here, we describe how simple, commonly accepted principles of neural processing can interact over time to solve the brain’s binding problem. We focus on mechanisms of neural inhibition and top-down feedback. Specifically, we describe how inhibition creates competition among neural populations that code different features, effectively suppressing irrelevant information, and thus minimizing illusory conjunctions. Top-down feedback contributes to binding in a similar manner, but by reinforcing relevant features. Together, inhibition and top-down feedback contribute to a competitive environment that ensures only the most appropriate features are bound together. We demonstrate this overall proposal using a biologically realistic neural model of vision that processes features across a hierarchy of interconnected brain areas. Finally, we argue that temporal synchrony plays only a limited role in binding – it does not simultaneously bind multiple objects, but does aid in creating additional contrast between relevant and irrelevant features. Thus, our overall theory constitutes a solution to the binding problem that relies only on simple neural principles without any binding-specific processes.
Perceptual tasks such as object matching, mammogram interpretation, mental rotation, and satellite imagery change detection often require the assignment of correspondences to fuse information across views. We apply techniques developed for machine translation to the gaze data recorded from a complex perceptual matching task modeled after fingerprint examinations. The gaze data provide temporal sequences that the machine translation algorithm uses to estimate the subjects' assumptions of corresponding regions. Our results show that experts and novices have similar surface behavior, such as the number of fixations made or the duration of fixations. However, the approach applied to data from experts is able to identify more corresponding areas between two prints. The fixations that are associated with clusters that map with high probability to corresponding locations on the other print are likely to have greater utility in a visual matching task. These techniques address a fundamental problem in eye tracking research with perceptual matching tasks: Given that the eyes always point somewhere, which fixations are the most informative and therefore are likely to be relevant for the comparison task?
Action selection, planning and execution are continuous processes that evolve over time, responding to perceptual feedback as well as evolving top-down constraints. Existing models of routine sequential action (e.g. coffee- or pancake-making) generally fall into one of two classes: hierarchical models that include hand-built task representations, or heterarchical models that must learn to represent hierarchy via temporal context, but thus far lack goal-orientedness. We present a biologically motivated model of the latter class that, because it is situated in the Leabra neural architecture, affords an opportunity to include both unsupervised and goal-directed learning mechanisms. Moreover, we embed this neurocomputational model in the theoretical framework of the theory of event coding (TEC), which posits that actions and perceptions share a common representation with bidirectional associations between the two. Thus, in this view, not only does perception select actions (along with task context), but actions are also used to generate perceptions (i.e. intended effects). We propose a neural model that implements TEC to carry out sequential action control in hierarchically structured tasks such as coffee-making. Unlike traditional feedforward discrete-time neural network models, which use static percepts to generate static outputs, our biological model accepts continuous-time inputs and likewise generates non-stationary outputs, making short-timescale dynamic predictions.
ExpertEyes is a low-cost, open-source package of hardware and software that is designed to provide portable high-definition eyetracking. The project involves several technological innovations, including portability, high-definition video recording, and multiplatform software support. It was designed for challenging recording environments, and all processing is done offline to allow for optimization of parameter estimation. The pupil and corneal reflection are estimated using a novel forward eye model that simultaneously fits both the pupil and the corneal reflection with full ellipses, addressing a common situation in which the corneal reflection sits at the edge of the pupil and therefore breaks the contour of the ellipse. The accuracy and precision of the system are comparable to or better than what is available in commercial eyetracking systems, with a typical accuracy of less than 0.4° and best accuracy below 0.3°, and with a typical precision (SD method) around 0.3° and best precision below 0.2°. Part of the success of the system comes from a high-resolution eye image. The high image quality results from uncasing common digital camcorders and recording directly to SD cards, which avoids the limitations of the analog NTSC format. The software is freely downloadable, and complete hardware plans are available, along with sources for custom parts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.