2021
DOI: 10.1101/2021.11.17.469009
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Modeling naturalistic face processing in humans with deep convolutional neural networks

Abstract: Deep convolutional neural networks (DCNNs) trained for face identification can rival and even exceed human-level performance. The relationships between internal representations learned by DCNNs and those of the primate face processing system are not well understood, especially in naturalistic settings. We developed the largest naturalistic dynamic face stimulus set in human neuroimaging research (700+ naturalistic video clips of unfamiliar faces) and used representational similarity analysis to investigate how… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 117 publications
1
4
0
Order By: Relevance
“…A more detailed investigation of layerwise encoding performance revealed a log-linear relationship where peak encoding performance tends to occur in relatively earlier layers as both model size and expressivity increase (Mischler et al, 2024). This is an unexpected extension of prior work on both language (Caucheteux & King, 2022;Kumar et al, 2022;Toneva & Wehbe, 2019) and vision (Jiahui et al, 2023), where peak encoding performance was found at late-intermediate layers. Moreover, we observed variations in best relative layers across different brain regions, corresponding to a language processing hierarchy.…”
Section: Discussionsupporting
confidence: 70%
“…A more detailed investigation of layerwise encoding performance revealed a log-linear relationship where peak encoding performance tends to occur in relatively earlier layers as both model size and expressivity increase (Mischler et al, 2024). This is an unexpected extension of prior work on both language (Caucheteux & King, 2022;Kumar et al, 2022;Toneva & Wehbe, 2019) and vision (Jiahui et al, 2023), where peak encoding performance was found at late-intermediate layers. Moreover, we observed variations in best relative layers across different brain regions, corresponding to a language processing hierarchy.…”
Section: Discussionsupporting
confidence: 70%
“…For example, object-trained and face-trained VGG models learn distinctly different feature detectors ( 7 ), yet explain a similar amount of variance in human face dissimilarity judgments. Object-trained and face-trained VGG models have previously been found to explain a similar amount of variance in human inferior temporal cortex ( 55 ) and in face-selective visual cortex ( 56 ), and object-trained VGG captured variance in early MEG responses ( 57 ). The face space within a face-trained DNN organizes faces differently than they are arranged in the BFM’s principal components, for example, clustering low-quality images at the “origin” of the space, eliciting lower activity from all learned features ( 42 ).…”
Section: Discussionmentioning
confidence: 91%
“…Humans may be sensitive to dynamic information evolving across frames that LLaMA does not have access to; similar divergence between static and dynamic stimuli has been found in face recognition networks. 71 However, it is not clear how GPT-4 exactly processes GIF-or image-based input. Being likely trained on text and being based on the GPT architecture, it is unlikely that GPT-4 can meaningfully make use of temporal dynamics in the same way as, e.g.…”
Section: Differences Between Modelsmentioning
confidence: 99%