Responses to natural stimuli in area V4, a mid-level area of the visual ventral stream, are well predicted by features from convolutional neural networks (CNNs) trained on image classification. This result has been taken as evidence for the functional role of V4 in object classification. However, we currently do not know if and to what extent V4 plays a role in solving other computational objectives. Here, we investigated normative accounts of V4 by predicting macaque single-neuron responses to natural images from the representations extracted by 23 CNNs trained on different computer vision tasks including semantic, geometric, 2D, and 3D visual tasks. We found that semantic classification tasks do indeed provide the best predictive features for V4. Other tasks (3D in particular) followed very closely in performance, but a similar pattern of tasks performance emerged when predicting the activations of a network exclusively trained on object recognition. Thus, our results support V4's main functional role in semantic processing. At the same time, they suggest that V4's affinity to various 3D and 2D stimulus features found by electrophysiologists could be a corollary of a semantic functional goal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.