Crowding reveals fundamental differences in local vs. global processing in humans and machines

Doerig, Adrien; Bornet, Alban; Choung, Oh-Hyeon; Herzog, Micahel H.

doi:10.1016/j.visres.2019.12.006

Cited by 49 publications

(32 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hence, our results show that, given adequate priors, CapsNets explain uncrowding. We have shown that ffCNNs and CNNs with lateral or top-down recurrent connections do not produce uncrowding, even when they are trained identically on groups of identical shapes and successfully learn on the training data, comparably to the CapsNets (furthermore, we showed previously that ffCNNs trained on large datasets, which are often used as general models of vision, do not show uncrowding either; [ 17 ]). This shows that merely training networks on groups of identical shapes is not sufficient to explain uncrowding.…”

Section: Discussionmentioning

confidence: 95%

“…In previous work, we have shown that pretrained ffCNNs cannot explain uncrowding [ 17 ], even if they are biased towards global shape processing [ 13 ]. Currently, CapsNets cannot be trained on large-scale tasks such as ImageNet because routing by agreement is computationally too expensive.…”

Section: Resultsmentioning

confidence: 99%

“…An important point of discussion concerns global visual processing. It was suggested that ffCNNs mainly focus on local, texture-like features, while humans harness global shape computations ([ 9 , 13 – 17 ]; but see [ 18 ]). In this context, it was shown that changing local features of an object, such as its texture or edges, leads ffCNNs to misclassify [ 13 , 14 ], while humans can still easily classify the object based on its global shape.…”

Section: Introductionmentioning

confidence: 99%

“…Previously, we have shown that these global effects of crowding cannot be explained by models based on the classic framework of vision, including ffCNNs [ 9 , 17 , 38 ]. Here, we propose a new framework to understand these global effects.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Capsule networks as recurrent models of grouping and segmentation

et al. 2020

Self Cite

View full text Add to dashboard Cite

Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that Caps-Nets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.

show abstract

Section: Discussionmentioning

confidence: 95%

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Capsule networks as recurrent models of grouping and segmentation

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…In particular, much work with vernier and letter stimuli showed that even small changes to the contextual stimuli, or changes far away from the target, can lead to target-surround ungrouping and a considerable reduction in crowding (Kooi, Toet, Tripathy, & Levi, 1994; Manassi, Sayim, & Herzog, 2012; Manassi et al, 2016; Manassi, Hermens, Francis, & Herzog, 2015; Manassi et al, 2013; Saarela, Sayim, Westheimer, & Herzog, 2009), a phenomenon known as “uncrowding”. It has been argued that these results show a failure of feedforward pooling models, such as the SS model, and that this failure is due to their lack of recurrent processes of grouping and segmentation (Doerig et al, 2019; Doerig, Bornet, Choung, & Herzog, 2020; Herzog et al, 2015; Francis, Manassi, & Herzog, 2017). Furthermore, current SS model implementations also fail to capture the peripheral appearance of natural scenes that contain strong grouping and segmentation cues (Wallis et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Flexible contextual modulation of naturalistic texture perception in peripheral vision

Herrera

Coen-Cagli

Gómez-Sena

2020

Preprint

View full text Add to dashboard Cite

Peripheral vision comprises most of our visual field, and is essential in guiding visual behavior. Its characteristic low resolution has been explained by the most influential theory of peripheral vision as the product of representing the visual input using summary-statistics. Despite its success, this account may provide a limited understanding of peripheral vision, because it neglects processes of perceptual grouping and segmentation. To test this hypothesis, we studied how contextual modulation, namely the modulation of the perception of a stimulus by its surrounds, interacts with segmentation in human peripheral vision. We used naturalistic textures, which are directly related to summary-statistics representations. We show that segmentation cues affect contextual modulation, and that this is not captured by the summary-statistics model. We then characterize the effects of different texture statistics on contextual modulation, providing guidance for extending the model, as well as for probing neural mechanisms of peripheral vision.

show abstract

Exploring, expounding & ersatzing: a three-level account of deep learning models in cognitive neuroscience

Subotić

2024

Synthese

View full text Add to dashboard Cite

Crowding reveals fundamental differences in local vs. global processing in humans and machines

Cited by 49 publications

References 40 publications

Capsule networks as recurrent models of grouping and segmentation

Capsule networks as recurrent models of grouping and segmentation

Flexible contextual modulation of naturalistic texture perception in peripheral vision

Exploring, expounding & ersatzing: a three-level account of deep learning models in cognitive neuroscience

Contact Info

Product

Resources

About