Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors

Zellou, Georgia; Cohn, Michelle

doi:10.21437/interspeech.2020-1335

Cited by 7 publications

(12 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, we did not observe differences in how participants adapted their speech following an emotionally expressive or neutral word misrecognition. This contrasts with related work on this same corpus (Zellou and Cohn, 2020) that found greater vowel duration alignment when participants responded to an emotionally expressive word misunderstanding made by a voice-AI system. Thus, it is possible that emotional expressiveness might shape vocal alignment, but it might not influence speech style adjustments.…”

Section: Discussioncontrasting

confidence: 99%

“…Alternatively, the presence of emotionality might lead to distinct clear speech strategies for the human and voice-AI interlocutors. For example, a study of phonetic alignment (using the same corpus in the current study) found that vowel duration alignment differed both by the social category of interlocutor (human vs. voice-AI) and based on emotionality (Zellou and Cohn, 2020): participants aligned more in response to a misrecognition, consistent with H&H theory (Lindblom, 1990), which increased even more when the voice-AI talker was emotionally expressive when conveying their misunderstanding (e.g., "Bummer! I'm not sure I understood.…”

Section: Different Strategies To Improve Intelligibility Following a Misrecognition?supporting

confidence: 59%

“…The present study examines a corpus of speech directed at a human and voice-AI interlocutor which crossed intelligibility factors (staged misrecognitions) and emotionality of the interlocutor's responses in identical pre-scripted tasks (Zellou and Cohn, 2020). This is the first study, to our knowledge, to test both intelligibility and emotional expressiveness factors in speech style adaptations for a voice-AI assistant and human.…”

Section: Current Studymentioning

confidence: 99%

“…Data were taken from a corpus (Zellou and Cohn, 2020) containing 53 native English speaking participants (27 female, 26 male; mean age of 20.28 years old, sd 2.42 years; range: 18-34) talking to a voice-AI and human interlocutor in an identical interactive task. None reported having any hearing impairment.…”

Section: Participantsmentioning

confidence: 99%

See 3 more Smart Citations

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Cohn

Zellou

2021

Front. Commun.

Self Cite

View full text Add to dashboard Cite

The current study tests whether individuals (n = 53) produce distinct speech adaptations during pre-scripted spoken interactions with a voice-AI assistant (Amazon’s Alexa) relative to those with a human interlocutor. Interactions crossed intelligibility pressures (staged word misrecognitions) and emotionality (hyper-expressive interjections) as conversation-internal factors that might influence participants’ intelligibility adjustments in Alexa- and human-directed speech (DS). Overall, we find speech style differences: Alexa-DS has a decreased speech rate, higher mean f0, and greater f0 variation than human-DS. In speech produced toward both interlocutors, adjustments in response to misrecognition were similar: participants produced more distinct vowel backing (enhancing the contrast between the target word and misrecognition) in target words and louder, slower, higher mean f0, and higher f0 variation at the sentence-level. No differences were observed in human- and Alexa-DS following displays of emotional expressiveness by the interlocutors. Expressiveness, furthermore, did not mediate intelligibility adjustments in response to a misrecognition. Taken together, these findings support proposals that speakers presume voice-AI has a “communicative barrier” (relative to human interlocutors), but that speakers adapt to conversational-internal factors of intelligibility similarly in human- and Alexa-DS. This work contributes to our understanding of human-computer interaction, as well as theories of speech style adaptation.

show abstract

Section: Discussioncontrasting

confidence: 99%

Section: Different Strategies To Improve Intelligibility Following a Misrecognition?supporting

confidence: 59%

Section: Current Studymentioning

confidence: 99%

Section: Participantsmentioning

confidence: 99%

See 2 more Smart Citations

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Cohn

Zellou

2021

Front. Commun.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Socially-mediated imitation patterns are often interpreted through the lens of Communication Accommodation Theory (CAT) (Giles et al, 1991;Shepard, 2001), which proposes that speakers use linguistic alignment to emphasize or minimize social differences between themselves and their interlocutors. The CAT framework can also be applied to understand humandevice interaction: recent studies that make a direct comparison between human and voice-AI interlocutors found greater vocal imitation for the human, relative to the voice-AI speaker (e.g., Apple's Siri in Cohn et al, 2019;Snyder et al, 2019;Amazon's Alexa in Raveh et al, 2019;Zellou and Cohn, 2020). Less speech alignment toward digital device assistants suggests that people may be less inclined to demonstrate social closeness toward voice-AI, as they do for humans.…”

Section: Introductionmentioning

confidence: 99%

Age- and Gender-Related Differences in Speech Alignment Toward Humans and Voice-AI

2021

Self Cite

View full text Add to dashboard Cite

Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction.

show abstract

Towards the Automatic Generation of Pedagogical Conversational Agents from Lecture Slides

Wölfel

2021

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

View full text Add to dashboard Cite

Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors

Cited by 7 publications

References 13 publications

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Age- and Gender-Related Differences in Speech Alignment Toward Humans and Voice-AI

Towards the Automatic Generation of Pedagogical Conversational Agents from Lecture Slides

Contact Info

Product

Resources

About