2018
DOI: 10.1016/j.neuroimage.2017.07.018
|View full text |Cite
|
Sign up to set email alerts
|

Convolutional neural network-based encoding and decoding of visual object recognition in space and time

Abstract: Representations learned by deep convolutional neural networks (CNNs) for object recognition are a widely investigated model of the processing hierarchy in the human visual system. Using functional magnetic resonance imaging, CNN representations of visual stimuli have previously been shown to correspond to processing stages in the ventral and dorsal streams of the visual system. Whether this correspondence between models and brain signals also holds for activity acquired at high temporal resolution has been exp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

10
113
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 104 publications
(123 citation statements)
references
References 49 publications
10
113
0
Order By: Relevance
“…This is in line with intracranial studies which showed that visual information is detectable in early visual cortex from 50 ms onwards (5). Furthermore, both intracranial as well as scalp electrophysiology has shown that around 130 ms, high-level object representations first get activated (4,6,7,31). Therefore, this early time window is representative of the feedforward sweep during perception.…”
Section: Resultssupporting
confidence: 87%
See 1 more Smart Citation
“…This is in line with intracranial studies which showed that visual information is detectable in early visual cortex from 50 ms onwards (5). Furthermore, both intracranial as well as scalp electrophysiology has shown that around 130 ms, high-level object representations first get activated (4,6,7,31). Therefore, this early time window is representative of the feedforward sweep during perception.…”
Section: Resultssupporting
confidence: 87%
“…First, low-level visual features such as orientation and spatial frequency are processed in primary, posterior visual areas (3) after which activation spreads forward towards secondary, more anterior visual areas where high-level features such as shape and eventually semantic category are processed (4)(5)(6). This initial feedforward flow through the visual hierarchy is completed within 150 ms (7,8) after which feedback processes are assumed to further sharpen representations over time until a stable percept is achieved (9,10). Activation in visual areas can also be triggered internally, in the absence of external sensory signals.…”
mentioning
confidence: 99%
“…Early attempts at identifying and localizing neural activity associated with specific visual features focused on either high-level sematic/categorical features (Hung, Kreiman, Poggio, & DiCarlo, 2005;Meyers et al, 2008;Walther, Caddigan, Fei-Fei, & Beck, 2009;Reddy, Tsuchiya, & Serre, 2010;Smith, & Goodale, 2013) or low-level features such as edges (Kay, Naselaris, Prenger, & Gallant, 2008;Naselaris et al, 2015)-limiting findings to a small slice of the cortical visual hierarchy. In contrast, features extracted from the layers of a deep CNN have been linked to activity over nearly the entire visual cortex during perception, with a correspondence between the hierarchical structures of the CNN and cortex (Yamins et al, 2014;Güçlü and van Gerven, 2015;Wen et al, 2017;Eickenberg, Gramfort, Varoquaux, & Thirion, 2017;Seeliger et al, 2018). Horikawa and Kamitani (2017) used this approach to reveal feature-specific neural reactivation throughout the ventral visual stream during mental imagery.…”
Section: Introductionmentioning
confidence: 99%
“…falsely detecting reactivation of features from (nearly) all levels of the visual hierarchy when only a small subset of the feature-levels are present within a given brain region. Güçlü and van Gerven (2015) and Seeliger et al (2018) developed a method to address this issue that first assigns the layer that best predicts a given voxel/source's activity to that voxel/source, and then uses the proportion of voxel/sources assigned to each layer within an ROI to infer the feature-levels represented within that cortical region. This approach, however, may overlook feature-levels that are weakly represented within a given region, due to the simplifying assumption that only one feature level is represented per voxel/source, resulting in false negatives.…”
Section: Introductionmentioning
confidence: 99%
“…These methods have dramatically improved the state-of-the-art in visual object recognition [103], motion detection [104] and many other domains such as autonomous navigation [105], medical diagnosis [106], etc. The first unsupervised learning procedures used Restricted Boltzmann Machines (RBM) to restrict the connectivity of the hidden units in order to make learning easier.…”
Section: An Overview Of Deep Learningmentioning
confidence: 99%