Abstract-Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition or vision-based navigation and manipulation. This article reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology. The hierarchal processing in the primate visual system is characterized by a sequence of different levels of processing (in the order of ten) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today's mainstream computer vision. We hope that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.
Abstract. We try to determine the progress made by convolutional neural networks over the past 25 years in classifying images into abstract classes. For this purpose we compare the performance of LeNet to that of GoogLeNet at classifying randomly generated images which are differentiated by an abstract property (e.g., one class contains two objects of the same size, the other class two objects of different sizes). Our results show that there is still work to do in order to solve vision problems humans are able to solve without much difficulty.
Computer Tomography (CT) is an imaging procedure that combines many X-ray measurements taken from different angles. The segmentation of areas in the CT images provides a valuable aid to physicians and radiologists in order to better provide a patient diagnose. The CT scans of a body torso usually include different neighboring internal body organs. Deep learning has become the state-of-the-art in medical image segmentation. For such techniques, in order to perform a successful segmentation, it is of great importance that the network learns to focus on the organ of interest and surrounding structures and also that the network can detect target regions of different sizes. In this paper, we propose the extension of a popular deep learning methodology, Convolutional Neural Networks (CNN), by including deep supervision and attention gates. Our experimental evaluation shows that the inclusion of attention and deep supervision results in consistent improvement of the tumor prediction accuracy across the different datasets and training sizes while adding minimal computational overhead.
We propose a computational model of a simple cell with push-pull inhibition, a property that is observed in many real simple cells. It is based on an existing model called Combination of Receptive Fields or CORF for brevity. A CORF model uses as afferent inputs the responses of model LGN cells with appropriately aligned center-surround receptive fields, and combines their output with a weighted geometric mean. The output of the proposed model simple cell with push-pull inhibition, which we call push-pull CORF, is computed as the response of a CORF model cell that is selective for a stimulus with preferred orientation and preferred contrast minus a fraction of the response of a CORF model cell that responds to the same stimulus but of opposite contrast. We demonstrate that the proposed push-pull CORF model improves signal-to-noise ratio (SNR) and achieves further properties that are observed in real simple cells, namely separability of spatial frequency and orientation as well as contrast-dependent changes in spatial frequency tuning. We also demonstrate the effectiveness of the proposed push-pull CORF model in contour detection, which is believed to be the primary biological role of simple cells. We use the RuG (40 images) and Berkeley (500 images) benchmark data sets of images with natural scenes and show that the proposed model outperforms, with very high statistical significance, the basic CORF model without inhibition, Gabor-based models with isotropic surround inhibition, and the Canny edge detector. The push-pull CORF model that we propose is a contribution to a better understanding of how visual information is processed in the brain as it provides the ability to reproduce a wider range of properties exhibited by real simple cells. As a result of push-pull inhibition a CORF model exhibits an improved SNR, which is the reason for a more effective contour detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.