Faces and voices each convey multiple cues enabling us to tell people apart. Research on face and voice distinctiveness commonly utilizes multidimensional space to represent these complex, perceptual abilities. We extend this framework to examine how a combined face-voice space would relate to its constituent face and voice spaces. Participants rated videos of speakers for their dissimilarity in face only, voice only, and face-voice together conditions. Multiple dimensional scaling (MDS) and regression analyses showed that whereas face-voice space more closely resembled face space, indicating visual dominance, face-voice distinctiveness was best characterized by a multiplicative integration of face-only and voice-only distinctiveness, indicating that auditory and visual cues are used interactively in person-distinctiveness judgments. Further, the multiplicative integration could not be explained by the small correlation found between face-only and voice-only distinctiveness. As an exploratory analysis, we next identified auditory and visual features that correlated with the dimensions in the MDS solutions. Features pertaining to facial width, lip movement, spectral centroid, fundamental frequency, and loudness variation were identified as important features in face-voice space. We discuss the implications of our findings in terms of person perception, recognition, and face-voice matching abilities.
Demonstrations of non-speech McGurk effects are rare, mostly limited to emotion identification, and sometimes not considered true analogues. We presented videos of males and females singing a single syllable on the same pitch and asked participants to indicate the true range of the voice-soprano, alto, tenor, or bass. For one group of participants, the gender shown on the video matched the gender of the voice heard, and for the other group they were mismatched. Soprano or alto responses were interpreted as "female voice" decisions and tenor or bass responses as "male voice" decisions. Identification of the voice gender was 100% correct in the preceding audio-only condition. However, whereas performance was also 100% correct in the matched video/audio condition, it was only 31% correct in the mismatched video/audio condition. Thus, the visual gender information overrode the voice gender identification, showing a robust non-speech McGurk effect.
We examined perception of artificial timbre blending using recordings of two actual instruments. In Experiment 1, participants heard stimuli comprising different proportions of sounds from an oboe and a trumpet, constructed using both a linear and a logarithmic algorithm, and judged the degree of blending. In Experiment 2, participants chose between an oboe and a trumpet in each blend condition. In both experiments, participants were able to track the degrees of blending between the two anchor points quite accurately. In Experiment 3, participants matched test blends to two target blends in an ABX design and showed no evidence for categorical perception of oboe and trumpet timbres in their judgments. Further, participants with and without musical training showed similar patterns of responding. The findings suggest a high level of sensitivity for timbre coding in auditory perception and also have implications for timbre manipulation as a compositional device and sound morphing techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.