IntroductionA singer’s or speaker’s Fach (voice type) should be appraised based on acoustic cues characterizing their voice. Instead, in practice, it is often influenced by the individual’s physical appearance. This is especially distressful for transgender people who may be excluded from formal singing because of perceived mismatch between their voice and appearance. To eventually break down these visual biases, we need a better understanding of the conditions under which they occur. Specifically, we hypothesized that trans listeners (not actors) would be better able to resist such biases, relative to cis listeners, precisely because they would be more aware of appearance-voice dissociations.MethodsIn an online study, 85 cisgender and 81 transgender participants were presented with 18 different actors singing or speaking short sentences. These actors covered six voice categories from high/bright (traditionally feminine) to low/dark (traditionally masculine) voices: namely soprano, mezzo-soprano (referred to henceforth as mezzo), contralto (referred to henceforth as alto), tenor, baritone, and bass. Every participant provided voice type ratings for (1) Audio-only (A) stimuli to get an unbiased estimate of a given actor’s voice type, (2) Video-only (V) stimuli to get an estimate of the strength of the bias itself, and (3) combined Audio-Visual (AV) stimuli to see how much visual cues would affect the evaluation of the audio.ResultsResults demonstrated that visual biases are not subtle and hold across the entire scale, shifting voice appraisal by about a third of the distance between adjacent voice types (for example, a third of the bass-to-baritone distance). This shift was 30% smaller for trans than for cis listeners, confirming our main hypothesis. This pattern was largely similar whether actors sung or spoke, though singing overall led to more feminine/high/bright ratings.ConclusionThis study is one of the first demonstrations that transgender listeners are in fact better judges of a singer’s or speaker’s voice type because they are better able to separate the actors’ voice from their appearance, a finding that opens exciting avenues to fight more generally against implicit (or sometimes explicit) biases in voice appraisal.