2021
DOI: 10.3389/fcomm.2021.675704
|View full text |Cite
|
Sign up to set email alerts
|

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Abstract: The current study tests whether individuals (n = 53) produce distinct speech adaptations during pre-scripted spoken interactions with a voice-AI assistant (Amazon’s Alexa) relative to those with a human interlocutor. Interactions crossed intelligibility pressures (staged word misrecognitions) and emotionality (hyper-expressive interjections) as conversation-internal factors that might influence participants’ intelligibility adjustments in Alexa- and human-directed speech (DS). Overall, we find speech style dif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
9

Relationship

3
6

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 66 publications
1
8
0
Order By: Relevance
“…Speakers produce 'clear speech' when there is a reason to believe their listener will have trouble comprehending the signal. Clear speech is characterized by a variety of acoustic modifications relative to casual or conversational speech, such as slowing their speaking rate and producing more extreme segmental articulations (Picheny et al, 1986;Krause & Braida, 2002;Uchanski, 2005;Smiljanić & Bradlow, 2009;Dilley et al, 2014;Cohn & Zellou, 2021;Cohn et al, 2022). Speaking clearly has repeatedly been shown to benefit listeners by increasing intelligibility (e.g.…”
Section: A Clear Speechmentioning
confidence: 99%
“…Speakers produce 'clear speech' when there is a reason to believe their listener will have trouble comprehending the signal. Clear speech is characterized by a variety of acoustic modifications relative to casual or conversational speech, such as slowing their speaking rate and producing more extreme segmental articulations (Picheny et al, 1986;Krause & Braida, 2002;Uchanski, 2005;Smiljanić & Bradlow, 2009;Dilley et al, 2014;Cohn & Zellou, 2021;Cohn et al, 2022). Speaking clearly has repeatedly been shown to benefit listeners by increasing intelligibility (e.g.…”
Section: A Clear Speechmentioning
confidence: 99%
“…In the current study, a routinization prediction would be a consistent distinction for speech features in human- and technology-DS, such as those paralleling increased vocal effort in response to a communicative barrier (increased duration, pitch, and intensity in technology-DS). As mentioned, prior studies have found adults’ technology register adjustments are often louder 15 , 19 , 20 , have longer productions/slower rate 10 , 17 , 18 , 44 , and have differences in pitch 15 , 18 , 19 , 23 , 44 from human-directed registers. Furthermore, a routinization prediction would be that, given their different experiences with systems, adults and children will vary in their device and human-directed registers.…”
Section: Introductionmentioning
confidence: 84%
“…When talking to technology, adults often make their speech louder and slower 15 ; this is true cross-linguistically, including for voice assistants in English 15 18 and German 19 , 20 , a robot in Swedish 21 , and computer avatar in English 10 , and it is consistent with the claim that people conceptualize technological agents as less communicatively competent than human interlocutors 11 , 15 , 22 . In some cases, English and French speakers also make their speech higher pitched when talking to another person compared to a voice assistant 17 or robot 23 , respectively. Taken together, the adjustments observed in technology-DS often parallel those made in challenging listening conditions; in the presence of background noise, speakers produce louder, slower, and higher pitched speech 24 , 25 .…”
Section: Introductionmentioning
confidence: 99%
“…One limitation of the current study is that the speech samples were not elicited as device-directed speech. Prior work has observed that speakers make distinct clear speech adjustments when talking to ASR-enabled devices, like smartphones and voice-AI assistants 56 , 57 , and adjust their pronunciations even more when the machine makes an error 39 . A ripe future direction is to explore whether cross-language ASR re-use recognition accuracy improves if the speakers are producing authentic device-directed speech.…”
Section: Discussionmentioning
confidence: 99%