In crowding, perception of an object deteriorates in the presence of nearby elements. Although crowding is a ubiquitous phenomenon, since elements are rarely seen in isolation, to date there exists no consensus on how to model it. Previous experiments showed that the global configuration of the entire stimulus must be taken into account. These findings rule out simple pooling or substitution models and favor models sensitive to global spatial aspects. In order to investigate how to incorporate global aspects into models, we tested a large number of models with a database of forty stimuli tailored for the global aspects of crowding. Our results show that incorporating grouping like components strongly improves model performance.
Feedforward Convolutional Neural Networks (ffCNNs) have become state-of-the-art models both in computer vision and neuroscience. However, human-like performance of ffCNNs does not necessarily imply human-like computations. Previous studies have suggested that current ffCNNs do not make use of global shape information. However, it is currently unclear whether this reflects fundamental differences between ffCNN and human processing or is merely an artefact of how ffCNNs are trained. Here, we use visual crowding as a well-controlled, specific probe to test global shape computations. Our results provide evidence that ffCNNs cannot produce human-like global shape computations for principled architectural reasons. We lay out approaches that may address shortcomings of ffCNNs to provide better models of the human visual system. texture. This training dataset biased an ffCNN (ResNet50; He, Zhang, Ren, & Sun, 2016) towards shape-level features, because textural information was no longer useful for classifying this dataset. They validated the network's shape-bias by showing increased robustness to local noise and textural changes. Alternatively, ffCNNs may be incapable of matching human global computations for principled architectural reasons. Even though Geirhos et al.'s network was able to ignore local features, it may not use global computations in the same way as humans. One difficulty in addressing this question is that there is no consensus about how to experimentally diagnose how deep networks compute global information.
Traditionally, human vision research has focused on specific paradigms and proposed models to explain very specific properties of visual perception. However, the complexity and scope of modern psychophysical paradigms undermine the success of this approach. For example, perception of an element strongly deteriorates when neighboring elements are presented in addition (visual crowding). As it was shown recently, the magnitude of deterioration depends not only on the directly neighboring elements but on almost all elements and their specific configuration. Hence, to fully explain human visual perception, one needs to take large parts of the visual field into account and combine all the aspects of vision that become relevant at such scale. These efforts require sophisticated and collaborative modeling. The Neurorobotics Platform (NRP) of the Human Brain Project offers a unique opportunity to connect models of all sorts of visual functions, even those developed by different research groups, into a coherently functioning system. Here, we describe how we used the NRP to connect and simulate a segmentation model, a retina model, and a saliency model to explain complex results about visual perception. The combination of models highlights the versatility of the NRP and provides novel explanations for inward-outward anisotropy in visual crowding.
The prediction of chemical reaction pathways has been
accelerated
by the development of novel machine learning architectures based on
the deep learning paradigm. In this context, deep neural networks
initially designed for language translation have been used to accurately
predict a wide range of chemical reactions. Among models suited for
the task of language translation, the recently introduced molecular
transformer reached impressive performance in terms of forward-synthesis
and retrosynthesis predictions. In this study, we first present an
analysis of the performance of transformer models for product, reactant,
and reagent prediction tasks under different scenarios of data availability
and data augmentation. We find that the impact of data augmentation
depends on the prediction task and on the metric used to evaluate
the model performance. Second, we probe the contribution of different
combinations of input formats, tokenization schemes, and embedding
strategies to model performance. We find that less stable input settings
generally lead to better performance. Lastly, we validate the superiority
of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show
a strong agreement for predictions that pass the round-trip test.
This demonstrates the usefulness of more elaborate metrics in complex
predictive scenarios and highlights the limitations of direct comparisons
to a predefined database, which may include a limited number of chemical
reaction pathways.
6Feedforward Convolutional Neural Networks (ffCNNs) have become state-of-the-art models both 7 in computer vision and neuroscience. However, human-like performance of ffCNNs does not 8 necessarily imply human-like computations. Previous studies have suggested that current ffCNNs 9 do not make use of global shape information. However, it is currently unclear whether this reflects 10 fundamental differences between ffCNN and human processing or is merely an artefact of how 11 ffCNNs are trained. Here, we use visual crowding as a well-controlled, specific probe to test global 12 shape computations. Our results provide evidence that ffCNNs cannot produce human-like global 13 shape computations for principled architectural reasons. We lay out approaches that may address 14 shortcomings of ffCNNs to provide better models of the human visual system. 15 16
In crowding, perception of a target deteriorates in the presence of nearby flankers. Surprisingly, perception can be rescued from crowding if additional flankers are added (uncrowding). Uncrowding is a major challenge for all classic models of crowding and vision in general, because the global configuration of the entire stimulus is crucial. However, it is unclear which characteristics of the configuration impact (un)crowding. Here, we systematically dissected flanker configurations and showed that (un)crowding cannot be easily explained by the effects of the sub-parts or low-level features of the stimulus configuration. Our modeling results suggest that (un)crowding requires global processing. These results are well in line with previous studies showing the importance of global aspects in crowding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.