Lynn Schmittwilken scite author profile

Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that Caps-Nets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.

show abstract

Capsule Networks as Recurrent Models ofGrouping and Segmentation

Doerig

Schmittwilken

Sayim

et al. 2019

Preprint

View full text Add to dashboard Cite

12Classically, visual processing is described as a cascade of local feedforward computations. Feedforward 13Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. Previously, 14 using visual crowding as a well-controlled challenge, we showed that no classic model of vision, 15including ffCNNs, can explain human global shape processing (1). Here, we show that Capsule Neural 16 Networks (CapsNets; 2), combining ffCNNs with a grouping and segmentation mechanism, solve this 17 challenge. We also show that ffCNNs and standard recurrent networks do not, suggesting that the 18 grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide 19 psychophysical evidence that grouping and segmentation is implemented recurrently in humans, and 20show that CapsNets reproduce these results well. We discuss why recurrence seems needed to 21 implement grouping and segmentation efficiently. Together, we provide mutually reinforcing 22 psychophysical and computational evidence that a recurrent grouping and segmentation process is 23 essential to understand the visual system and create better models that harness global shape 24 computations. 25 26Author Summary 27 Feedforward Convolutional Neural Networks (ffCNNs) have revolutionized computer vision and are 28 deeply transforming neuroscience. However, ffCNNs only roughly mimic human vision. There is a 29 rapidly expanding literature investigating differences between humans and ffCNNs. Several findings 30 suggest that, unlike humans, ffCNNs rely mostly on local visual features. Furthermore, ffCNNs lack 31 recurrent connections, which abound in the brain. Here, we use visual crowding, a well-known 32 psychophysical phenomenon, to investigate recurrent computations in global shape processing. 33Previously, we showed that no model based on the classic feedforward framework of vision, including 34 ffCNNs, can explain global effects in crowding. Here, we show that Capsule Networks (CapsNets), 35combining ffCNNs with recurrent grouping and segmentation, solve this challenge. Lateral and top-36 down recurrent connections do not, suggesting that grouping and segmentation are crucial for 37 human-like global computations. Based on these results, we hypothesize that one computational 38 function of recurrence is to efficiently implement grouping and segmentation. We provide 39 psychophysical evidence that, indeed, recurrent processes implement grouping and segmentation in 40 humans. CapsNets reproduce these results too. Together, we provide mutually reinforcing 41 computational and psychophysical evidence that a recurrent grouping and segmentation process is 42 essential to understand the visual system and create better models that harness global shape 43 computations. 44

show abstract

Neurobiological mechanisms of deep transcranial magnetic stimulation (dtms): a systematic review

2017

View full text Add to dashboard Cite

Fixational eye movements enable robust edge detection

Schmittwilken

Maertens

2022

Journal of Vision

View full text Add to dashboard Cite

Human vision relies on mechanisms that respond to luminance edges in space and time. Most edge models use orientation-selective mechanisms on multiple spatial scales and operate on static inputs assuming that edge processing occurs within a single fixational instance. Recent studies, however, demonstrate functionally relevant temporal modulations of the sensory input due to fixational eye movements. Here we propose a spatiotemporal model of human edge detection that combines elements of spatial and active vision. The model augments a spatial vision model by temporal filtering and shifts the input images over time, mimicking an active sampling scheme via fixational eye movements. The first model test was White's illusion, a lightness effect that has been shown to depend on edges. The model reproduced the spatial-frequency-specific interference with the edges by superimposing narrowband noise (1–5 cpd), similar to the psychophysical interference observed in White's effect. Second, we compare the model's edge detection performance in natural images in the presence and absence of Gaussian white noise with human-labeled contours for the same (noise-free) images. Notably, the model detects edges robustly against noise in both test cases without relying on orientation-selective processes. Eliminating model components, we demonstrate the relevance of multiscale spatiotemporal filtering and scale-specific normalization for edge detection. The proposed model facilitates efficient edge detection in (artificial) vision systems and challenges the notion that orientation-selective mechanisms are required for edge detection.

show abstract

Fixational eye movements enable robust edge detection

Schmittwilken

Maertens

2022

Preprint

View full text Add to dashboard Cite

Human vision relies on mechanisms that respond to luminance edges in space and time. Most edge models use orientation-selective mechanisms on multiple spatial scales and operate on static inputs assuming that edge processing occurs within a single fixational instance. Recent studies, however, demonstrate functionally relevant temporal modulations of the sensory input due to fixational eye movements. Here we propose a spatiotemporal model of human edge detection which combines elements of spatial and active vision. The model augments a spatial vision model by temporal filtering and shifts the input images over time mimicking an active sampling scheme via fixational eye movements. The first model test was White's illusion, a lightness effect that has been shown to depend on edges. The model reproduced the spatial-frequency-specific interference with the edges by superimposing narrowband noise (1-5 cpd), similar to the psychophysical interference observed in White's effect. Second, we compare the model's edge detection performance in natural images in the presence and absence of Gaussian white noise with human-labeled contours for the same (noise-free) images. Notably, the model detects edges robustly against noise in both test cases without relying on orientation-selective processes. Eliminating model components, we demonstrate the relevance of multiscale spatiotemporal filtering and scale-specific normalization for edge detection. The proposed model facilitates efficient edge detection in (artificial) vision systems and challenges the notion that orientation-selective mechanisms are required for edge detection.

show abstract

Contributed Session III: BRENCH: An open-source framework for b(r)enchmarking brightness models

et al. 2022

View full text Add to dashboard Cite

How crowding challenges (feedforward) convolutional neural networks

et al. 2021

View full text Add to dashboard Cite

stimupy: A Python package for creating stimuli in vision science

Schmittwilken¹,

Maertens²,

Vincent³

2023

JOSS

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.