Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions

Tuckute, Greta; Feather, Jenelle; Boebinger, Dana; McDermott, Josh H.

doi:10.1371/journal.pbio.3002366

Cited by 12 publications

(7 citation statements)

References 148 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…What all of these representation types have in common is that similarity relations between representations are characterized in terms of cosines. This applies to a large variety of machine learning models, not just those used for words and still images, but also dynamic stimuli like audio (Kell et al, 2018;Tuckute et al, 2023) and video (Lotter et al, 2017). C2L thus enables cognitive models to be applied to the increasingly complex and naturalistic items that machine learning models will be able to process.…”

Section: Other Sources Of Similarity Informationmentioning

confidence: 99%

From Cosine Similarity to Likelihood Ratio: Coupling Representations From Machine Learning (and Other Sources) With Cognitive Models

Cox

2024

Preprint

View full text Add to dashboard Cite

Modern machine learning models yield vector representations that capture similarity relations between complex items like text and images. These representations can help explain and predict how individuals respond to those items in particular tasks, but only if representations are coupled to a cognitive model of the processes people use to perform those tasks. I introduce C2L ("context to likelihood"), a mathematical transformation of the similarity between vector representations, operationalized as the cosine of the angle between them, into a ratio of the relative likelihood that the two representations encode the same versus different items. The likelihood ratio operationalizes similarity in a manner that is motivated by cognitive theories of perception and memory and is readily "plugged in" to cognitive models. Two example applications show how C2L can be used to compute drift rates of a diffusion decision model based on similarity information derived from machine learning models, thereby accounting for the speed and accuracy with which individual participants recognize individual items. C2L enables inferences regarding how different people represent items, how much information they encode about each item, and how that information is affected by experimental manipulations. C2L serves both the practical purpose of making it easier to incorporate representations from machine learning into cognitive models and the theoretical purpose of allowing cognitive models to grant insight into how people process the increasingly complex, naturalistic items to which machine learning models are applied.

show abstract

Section: Other Sources Of Similarity Informationmentioning

confidence: 99%

From Cosine Similarity to Likelihood Ratio: Coupling Representations From Machine Learning (and Other Sources) With Cognitive Models

Cox

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…In recent years, deep neural networks (DNNs) have emerged as a powerful tool for representing complex visual data, such as images (LeCun et al, 2015) or videos (Liu et al, 2020). In the auditory domain, DNNs have been shown to provide valuable representations-so-called feature or latent spaces-for modeling the cerebral processing of sound (brain encoding) (speech: Kell et al, 2018;Millet et al, 2022;Tuckute & Feather, 2023; semantic content: Caucheteux et 3 al., 2023;Giordano et al, 2023;music: Güçlü et al, 2016), or reconstructing the stimuli listened by a participant (brain decoding) (Akbari et al, 2019). They have not yet been used to explain cerebral representations of identity-related information due in part to the focus on speech information (von Kriegstein et al, 2003).…”

Section: Introductionmentioning

confidence: 99%

“…We addressed this question by using representational similarity analysis (RSA; Kriegeskorte et al, 2008) to test which model better accounts for the representational geometry for voice identities in the auditory cortex. Using RSA as a model comparison framework is relevant to examining the brain-model relationship from complementary angles (Diedrichsen & Kriegeskorte, 2017;Giordano et al, 2023;Tuckute & Feather, 2023). We built speaker x speaker representational dissimilarity matrices (RDMs) capturing pairwise differences in cerebral activity or model predictions between all pairs of speakers; then, we examined how well the LIN and VLS-derived RDMs correlated with the cerebral RDMs from A1 and the TVAs.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…The extent to which the VLS allows linearly predicting the fMRI recordings does not provide insight into the representational geometries, i.e., the differences between the patterns of cerebral activity for speaker identity. We addressed this question by using representational similarity analysis (RSA; Kriegeskorte et al, 2008) (Diedrichsen & Kriegeskorte, 2017;Giordano et al, 2023;Tuckute & Feather, 2023). We built speaker x speaker representational dissimilarity matrices (RDMs) capturing pairwise differences in cerebral activity or model predictions between all pairs of speakers; then, we examined how well the LIN and VLS-derived RDMs correlated with the cerebral RDMs from A1 and the TVAs.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Reconstructing Voice Identity from Noninvasive Auditory Cortex Recordings

Lamothe,

Thoret,

Trapeau

et al. 2024

Preprint

View full text Add to dashboard Cite

The cerebral processing of voice information is known to engage Temporal Voice Areas (TVAs) that respond preferentially to conspecific vocalizations. But how voice information related to the stable physical characteristics of the speaker such as gender, age or identity is represented by neuronal populations in these areas remains poorly understood. Here we used a deep neural network (DNN) to generate a high-level, small-dimension representational space of voice stimuli—the ‘voice latent space’ (VLS)—and examined its linear relation with cerebral activity via encoding, representational similarity and decoding analyses. We find that the VLS maps onto fMRI measures of cerebral activity in response to tens of thousands of voice stimuli from hundreds of different speaker identities, and better accounts for the representational geometry for speaker identity in the TVAs than in A1. Moreover, the VLS allowed TVA-based reconstructions of voice stimuli that preserved important aspects of speaker gender and identity as assessed by both machine classifiers and human listeners. These results demonstrate that a low-dimensional, DNN-derived space accounts well for cerebral voice representations and provide insights into representational differences between A1 and the TVAs, paving the way to noninvasive brain-computer interface applications.

show abstract

The language network as a natural kind within the broader landscape of the human brain

Fedorenko,

Ivanova,

Regev

2024

Nat. Rev. Neurosci.

View full text Add to dashboard Cite

Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions

Cited by 12 publications

References 148 publications

From Cosine Similarity to Likelihood Ratio: Coupling Representations From Machine Learning (and Other Sources) With Cognitive Models

From Cosine Similarity to Likelihood Ratio: Coupling Representations From Machine Learning (and Other Sources) With Cognitive Models

Reconstructing Voice Identity from Noninvasive Auditory Cortex Recordings

The language network as a natural kind within the broader landscape of the human brain

Contact Info

Product

Resources

About