Siri, you've changed! Acoustic properties and racialized judgments of voice assistants

Holliday, Nicole

doi:10.3389/fcomm.2023.1116955

Cited by 4 publications

(5 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, even though there is less alignment toward device interlocutors, suggesting that device interlocutors are viewed as socially distinct from humans, people still apply gender stereotypes to technological agents based on the properties of the voice alone. More recent work finds similar biases in evaluation of robots, smart speakers, and voice assistants based on social-indexical properties of the voices (Ernst and Herm-Stapelberg, 2020;Holliday, 2023;and see Sutton et al, 2019 for discussion of biases and speech-based attitudes and discrimination as relevant for voice-AI design). The question of how such biases play out in vocal alignment behavior toward voice-AI is an open question for future work.…”

Section: Vocal Alignment Toward Speech Technologymentioning

confidence: 90%

“…This has been shown to apply to voice-AI as well: users perceive male voice assistants as more competent than female voice assistants (Ernst and Herm-Stapelberg, 2020). Since voice-based stereotyping also occurs based on the racial and age-based cues present in talkers' speech (e.g., Kurinec and Weaver, 2021 for race; e.g., Hummert et al, 2004 for age), we predict that similar biases in judgments of communicative competence vary based on apparent ethnicity and age of device voices [see discussion of Holliday (2023) and related work in section 3]. Whether these factors influence patterns and extent of pronunciation adjustments present in device-DS is a ripe question for future work.…”

Section: User Speech Variation In Production During Human-computer In...mentioning

confidence: 94%

“…While the 2021 voices were in beta testing, online users began to speculate about the voices' "races" and "genders" (Waddell, 2021). Holliday (2023) found that indeed, the four Siri voices released in 2021 were evaluated differently from one another in terms of gender, age, race, and regional background, demonstrating that listeners did have differing social perceptions of them. In 2022, Apple expanded upon this pattern of introducing new, more diverse voices when it added a fifth Siri voice, "Voice 5″, also known publicly as "Quinn" (Porter, 2022).…”

Section: Factors Related To Speech Generation and Tts Variationmentioning

confidence: 98%

“…While companies such as Apple expand their TTS offerings to contain a wider array of voices with different social identities, these strategies are not without cause for concern. Holliday (2023) observes that while listeners attach different demographic traits to the different Siri voices, they also attach negative stereotypes about those traits. Her study found that Siri Voice 3, the voice most likely to be categorized as Black, male, and young, was also judged as less competent and less professional than the other voices.…”

Section: Factors Related To Speech Generation and Tts Variationmentioning

confidence: 99%

See 3 more Smart Citations

Linguistic analysis of human-computer interaction

Zellou,

Holliday

2024

Front. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

This article reviews recent literature investigating speech variation in production and comprehension during spoken language communication between humans and devices. Human speech patterns toward voice-AI presents a test to our scientific understanding about speech communication and language use. First, work exploring how human-AI interactions are similar to, or different from, human-human interactions in the realm of speech variation is reviewed. In particular, we focus on studies examining how users adapt their speech when resolving linguistic misunderstandings by computers and when accommodating their speech toward devices. Next, we consider work that investigates how top-down factors in the interaction can influence users’ linguistic interpretations of speech produced by technological agents and how the ways in which speech is generated (via text-to-speech synthesis, TTS) and recognized (using automatic speech recognition technology, ASR) has an effect on communication. Throughout this review, we aim to bridge both HCI frameworks and theoretical linguistic models accounting for variation in human speech. We also highlight findings in this growing area that can provide insight to the cognitive and social representations underlying linguistic communication more broadly. Additionally, we touch on the implications of this line of work for addressing major societal issues in speech technology.

show abstract

Section: Vocal Alignment Toward Speech Technologymentioning

confidence: 90%

Section: User Speech Variation In Production During Human-computer In...mentioning

confidence: 94%

Section: Factors Related To Speech Generation and Tts Variationmentioning

confidence: 98%

Section: Factors Related To Speech Generation and Tts Variationmentioning

confidence: 99%

See 2 more Smart Citations

Linguistic analysis of human-computer interaction

Zellou,

Holliday

2024

Front. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In particular, they highlight that using language is one such "cue". In support of this view, speakers have been shown to vocally align their speech when talking to voice-AI interlocutors similarly to human interlocutors (Cohn, Predeck, et al, 2021;Zellou, Cohn, & Ferenc Segedin, 2021), and a growing body of work has shown that people perceive social attributes of voice-AI, including gender, age, race/ethnicity, and emotion (Cohn et al, 2019;Ernst & Herm-Stapelberg, 2020;Gessinger et al, 2022;Holliday, 2023;Zellou, Cohn, & Ferenc Segedin, 2021). In the present study, finding similar prosodic focus marking would suggest that the acoustic realization of information structure is part of this application of human-human social rules to voice-AI, suggesting that equivalence supersedes adaptations for a less-than-rational listener.…”

Section: Rational Listener Hypothesismentioning

confidence: 97%

Prosodic focus marking for voice-AI and human addressees

Beier,

Cohn,

Trammel

et al. 2023

Preprint

View full text Add to dashboard Cite

Prosodic prominence (through increased pitch, intensity, duration) is thought to guide listeners’ attention to new information, which relies on the assumption that a rational listener will benefit from these prosodic cues. This study investigates production and perception of prosodic focus marking toward two types of addresses: a human and a voice assistant interlocutor, a potentially less-than-rational listener. Stimuli consisted of question-answer pairs, where American English speakers read identical sentences (e.g., “Jude saw the sun”) in response to interlocutors’ questions probing different foci (e.g., “Who saw the sun?”). Experiment 1 reveals consistent acoustic adjustments to mark focus on either the subject or object of a sentence: speakers increase vowel intensity and duration. In Experiment 2, we find that listeners reliably infer the intended information structure based on these acoustic adjustments. Across both experiments, we see no consistent difference in focus marking by type of interlocutor (human vs. voice assistant). However, listeners associate particular features (e.g., slower speech rate) with speech directed at voice assistants. Taken together, our findings suggest that while speakers apply communicative strategies from human-human interaction when addressing voice assistants, listeners expect a device-specific register.

show abstract

Beyond the Front Yard: The Dehumanizing Message of Accent-Altering Technology

Payne,

Austin,

Clemons

2024

Applied Linguistics

View full text Add to dashboard Cite

Over the past decade, the artificial intelligence (AI) industry, as it relates to the speech and voice recognition industry, has established itself as a multibillion-dollar global market, but at whose expense? In this forum article, we amplify the current critiques of the architectures of large language models being used increasingly in daily life. Our commentary exposes emerging AI accent modification technology and services as agents of racial commodification and linguistic dominance, as it rests on the perceived superiority of standardized US English. We discuss our concern for such services leaching into academia. We argue that this technology follows a standardized language framework, which poses a fundamental problem of being informed by purist monolingual principles. These principles often help to perpetuate and maintain harmful raciolinguistic ideologies that result in language discrimination and the continual framing of the language practices of racially minoritized speakers as deficient. Thus, we write this piece with the intent to expose the fabricated humanity of accent modification technology whose existence perpetuates capitalism’s reliance on dehumanization for economic advancement and the legacy and reproduction of white language superiority.

show abstract

Siri, you've changed! Acoustic properties and racialized judgments of voice assistants

Cited by 4 publications

References 31 publications

Linguistic analysis of human-computer interaction

Linguistic analysis of human-computer interaction

Prosodic focus marking for voice-AI and human addressees

Beyond the Front Yard: The Dehumanizing Message of Accent-Altering Technology

Contact Info

Product

Resources

About