Our voices sound different depending on the context (laughing vs. talking to a child vs. giving a speech), making within-person variability an inherent feature of human voices. When perceiving speaker identities, listeners therefore need to not only 'tell people apart' (perceiving exemplars from two different speakers as separate identities) but also 'tell people together' (perceiving different exemplars from the same speaker as a single identity). In the current study, we investigated how such natural within-person variability affects voice identity perception. Using voices from a popular TV show, listeners, who were either familiar or unfamiliar with this show, sorted naturally varying voice clips from two speakers into clusters to represent perceived identities. Across three independent participant samples, unfamiliar listeners perceived more identities than familiar listeners and frequently mistook exemplars from the same speaker to be different identities. These findings point towards a selective failure in 'telling people together'. Our study highlights within-person variability as a key feature of voices that has striking effects on (unfamiliar) voice identity perception. Our findings not only open up a new line of enquiry in the field of voice perception but also call for a re-evaluation of theoretical models to account for natural variability during identity perception.
The human voice is a highly flexible instrument for self-expression, yet voice identity perception is largely studied using controlled speech recordings. Using two voice-sorting tasks with naturally varying stimuli, we compared the performance of listeners who were familiar and unfamiliar with the TV show Breaking Bad. Listeners organised audio clips of speech with (1) low-expressiveness and (2) high-expressiveness into perceived identities. We predicted that increased expressiveness (e.g., shouting, strained voice) would significantly impair performance. Overall, while unfamiliar listeners were less able to generalise identity across exemplars, the two groups performed equivalently well when telling voices apart when dealing with low-expressiveness stimuli. However, high vocal expressiveness significantly impaired telling apart in both the groups: this led to increased misidentifications, where sounds from one character were assigned to the other. These misidentifications were highly consistent for familiar listeners but less consistent for unfamiliar listeners. Our data suggest that vocal flexibility has powerful effects on identity perception, where changes in the acoustic properties of vocal signals introduced by expressiveness lead to effects apparent in familiar and unfamiliar listeners alike. At the same time, expressiveness appears to have affected other aspects of voice identity processing selectively in one listener group but not the other, thus revealing complex interactions of stimulus properties and listener characteristics (i.e., familiarity) in identity processing.
Within-person variability is a striking feature of human voices: our voices sound different depending on the context (laughing vs. talking to a child vs. giving a speech). When perceiving speaker identities, listeners therefore need to not only "tell people apart" (perceiving exemplars from two different speakers as separate identities) but also "tell people together" (perceiving different exemplars from the same speaker as a single identity). In the current study, we investigated how such natural within-person variability affects voice identity perception. Using voices from a popular TV show, listeners, who were either familiar or unfamiliar with the show, sorted naturally-varying voice clips from 2 speakers into clusters to represent perceived identities. Across three independent participant samples, unfamiliar listeners perceived more identities than familiar listeners and frequently mistook exemplars from the same speaker to be different identities. These findings point towards a selective failure in "telling people together". Our study highlights within-person variability as a key feature of voices that has striking effects on (unfamiliar) voice identity perception. Our findings not only open up a new line of enquiry in the field of voice perception but also call for a re-evaluation of theoretical models to account for natural variability during identity perception.
We investigated the effects of two types of task instructions on performance of a voice sorting task by listeners who were either familiar or unfamiliar with the voices. Listeners were asked to sort 15 naturally varying stimuli from two voice identities into perceived identities. Half of the listeners were to sort the recordings freely into as many identities as they perceived; the other half were forced to sort stimuli into two identities only. As reported in previous studies, unfamiliar listeners formed more clusters than familiar listeners. These listeners perceived different naturally varying stimuli from the same identity as coming from different identities, while being highly accurate at telling apart the stimuli from different voices. We show that a change in task instructions -forcing listeners to sort stimuli into two identities only -helped unfamiliar listeners to overcome this selective failure at "telling people together". This improvement, however, came at the cost of an increase in errors in telling people apart. For familiar listeners, similar non-significant trends were apparent. Therefore, even when informed about correct number of identities, listeners may fail to accurately perceive identity further highlighting that voice identity perception in the context of natural within-person variability is a challenging task. We discuss our results in terms of similarities and differences to findings in the face perception literature and their importance in applied settings, such as forensic voice identification.
In the current study we investigated the effects of two types of task instructions on performance of a voice sorting task by familiar and unfamiliar listeners. In Experiment 1, listeners were asked to sort 15 naturally-varying exemplars from two voice identities into as many identities as they perceived. Results replicate the findings of previous voice and face sorting studies: unfamiliar listeners form more clusters than familiar listeners, selectively failing to “tell people together”. That is, unfamiliar listeners erroneously perceive exemplars from the same identity as coming from different identities, while being highly accurate at telling apart the exemplars from different voices. In Experiment 2, we show that a change in task instructions - forcing participants to sort exemplars into two identities only - allows unfamiliar listeners to overcome this selective failure at “telling people together”. This improvement, however, comes at the cost of an increase in errors in telling people apart. For familiar listeners, similar non-significant trends are apparent. The current study thus shows that, even when being given the correct number of identities, listeners may fail to accurately perceive identity. These findings highlight voice identity perception in the context of natural within-person variability as a challenging task for both unfamiliar and familiar listeners.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.