fMRI studies increasingly examine functions and properties of non-primary areas of human auditory cortex. However there is currently no standardized localization procedure to reliably identify specific areas across individuals such as the standard ‘localizers’ available in the visual domain. Here we present an fMRI ‘voice localizer’ scan allowing rapid and reliable localization of the voice-sensitive ‘temporal voice areas’ (TVA) of human auditory cortex. We describe results obtained using this standardized localizer scan in a large cohort of normal adult subjects. Most participants (94%) showed bilateral patches of significantly greater response to vocal than non-vocal sounds along the superior temporal sulcus/gyrus (STS/STG). Individual activation patterns, although reproducible, showed high inter-individual variability in precise anatomical location. Cluster analysis of individual peaks from the large cohort highlighted three bilateral clusters of voice-sensitivity, or “voice patches” along posterior (TVAp), mid (TVAm) and anterior (TVAa) STS/STG, respectively. A series of extra-temporal areas including bilateral inferior prefrontal cortex and amygdalae showed small, but reliable voice-sensitivity as part of a large-scale cerebral voice network. Stimuli for the voice localizer scan and probabilistic maps in MNI space are available for download.
Voices carry large amounts of socially relevant information on persons, much like 'auditory faces'. Following Bruce and Young (1986)'s seminal model of face perception, we propose that the cerebral processing of vocal information is organized in interacting but functionally dissociable pathways for processing the three main types of vocal information: speech, identity, and affect. The predictions of the 'auditory face' model of voice perception are reviewed in the light of recent clinical, psychological, and neuroimaging evidence.
BackgroundIn recent years, analyses of event related potentials/fields have moved from the selection of a few components and peaks to a mass-univariate approach in which the whole data space is analyzed. Such extensive testing increases the number of false positives and correction for multiple comparisons is needed.MethodHere we review all cluster-based correction for multiple comparison methods (cluster-height, cluster-size, cluster-mass, and threshold free cluster enhancement – TFCE), in conjunction with two computational approaches (permutation and bootstrap).ResultsData driven Monte-Carlo simulations comparing two conditions within subjects (two sample Student's t-test) showed that, on average, all cluster-based methods using permutation or bootstrap alike control well the family-wise error rate (FWER), with a few caveats.Conclusions(i) A minimum of 800 iterations are necessary to obtain stable results; (ii) below 50 trials, bootstrap methods are too conservative; (iii) for low critical family-wise error rates (e.g. p = 1%), permutations can be too liberal; (iv) TFCE controls best the type 1 error rate with an attenuated extent parameter (i.e. power < 1).
Vocal attractiveness has a profound influence on listeners-a bias known as the "what sounds beautiful is good" vocal attractiveness stereotype [1]-with tangible impact on a voice owner's success at mating, job applications, and/or elections. The prevailing view holds that attractive voices are those that signal desirable attributes in a potential mate [2-4]-e.g., lower pitch in male voices. However, this account does not explain our preferences in more general social contexts in which voices of both genders are evaluated. Here we show that averaging voices via auditory morphing [5] results in more attractive voices, irrespective of the speaker's or listener's gender. Moreover, we show that this phenomenon is largely explained by two independent by-products of averaging: a smoother voice texture (reduced aperiodicities) and a greater similarity in pitch and timbre with the average of all voices (reduced "distance to mean"). These results provide the first evidence for a phenomenon of vocal attractiveness increases by averaging, analogous to a well-established effect of facial averaging [6, 7]. They highlight prototype-based coding [8] as a central feature of voice perception, emphasizing the similarity in the mechanisms of face and voice perception.
SummaryListeners exploit small interindividual variations around a generic acoustical structure to discriminate and identify individuals from their voice—a key requirement for social interactions. The human brain contains temporal voice areas (TVA) [1] involved in an acoustic-based representation of voice identity [2–6], but the underlying coding mechanisms remain unknown. Indirect evidence suggests that identity representation in these areas could rely on a norm-based coding mechanism [4, 7–11]. Here, we show by using fMRI that voice identity is coded in the TVA as a function of acoustical distance to two internal voice prototypes (one male, one female)—approximated here by averaging a large number of same-gender voices by using morphing [12]. Voices more distant from their prototype are perceived as more distinctive and elicit greater neuronal activity in voice-sensitive cortex than closer voices—a phenomenon not merely explained by neuronal adaptation [13, 14]. Moreover, explicit manipulations of distance-to-mean by morphing voices toward (or away from) their prototype elicit reduced (or enhanced) neuronal activity. These results indicate that voice-sensitive cortex integrates relevant acoustical features into a complex representation referenced to idealized male and female voice prototypes. More generally, they shed light on remarkable similarities in cerebral representations of facial and vocal identity.
Background: Previous electrophysiological studies have identified a "voice specific response" (VSR) peaking around 320 ms after stimulus onset, a latency markedly longer than the 70 ms needed to discriminate living from non-living sound sources and the 150 ms to 200 ms needed for the processing of voice paralinguistic qualities. In the present study, we investigated whether an early electrophysiological difference between voice and non-voice stimuli could be observed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.