Age- and Gender-Related Differences in Speech Alignment Toward Humans and Voice-AI

Zellou, Georgia; Cohn, Michelle; Segedin, Bruno Ferenc

doi:10.3389/fcomm.2020.600361

Cited by 29 publications

(22 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, the present study used two types of voices; it is possible that other paralinguistic features of those voices might have mediated speech style adjustments. For example, recent work has shown that speakers align speech differently toward TTS voices that "sound" older (e.g., Apple's Siri voices, rated in their 40 and 50s) (Zellou et al, 2021). Furthermore, there is work showing that introducing "charismatic" features from human speakers' voices shapes perception of TTS voices (Fischer et al, 2019;Niebuhr and Michalsky, 2019).…”

Section: Discussionmentioning

confidence: 99%

“…First, the interlocutor introduced themselves and then went through voice-over instructions with the participant. Participants saw an image corresponding to the interlocutor category: stock images of "adult female" (used in prior work; Zellou et al, 2021) and "Amazon Alexa" (2nd Generation Black Echo).…”

Section: Methodsmentioning

confidence: 99%

“…A growing body of research has begun to investigate the social, cognitive, and linguistic effects of humans interacting with voice-AI (Purington et al, 2017;Arnold et al, 2019;Cohn et al, 2019b;Burbach et al, 2019). For example, recent work has shown that listeners attribute human-like characteristics to the text-tospeech (TTS) output used for modern voice-AI, including personality traits (Lopatovska, 2020), apparent age (Cohn et al, 2020a;Zellou et al, 2021), and gender (Habler et al, 2019;Loideain and Adams, 2020). While the spread of voice-AI assistants is undeniable-particularly in the United States-there are many open scientific questions as to the nature of people's interactions with voice-AI.…”

Section: Introductionmentioning

confidence: 99%

“…For example, people appear to apply politeness norms from humanhuman interaction to computers: giving more favorable ratings when a computer directly asks about its own performance, relative to when a different computer elicits this information (Nass et al, 1994;Hoffmann et al, 2009). In line with technology equivalence accounts, there is some evidence for applied social behaviors to voice-AI in the way people adjust their speech, such as gendermediated vocal alignment (Cohn et al, 2019b;Zellou et al, 2021). In the present study, one prediction from technology equivalence accounts is that people will adjust their speech patterns when talking to voice-AI and humans in similar ways if the communicative context is controlled.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Cohn

Zellou

2021

Front. Commun.

Self Cite

View full text Add to dashboard Cite

The current study tests whether individuals (n = 53) produce distinct speech adaptations during pre-scripted spoken interactions with a voice-AI assistant (Amazon’s Alexa) relative to those with a human interlocutor. Interactions crossed intelligibility pressures (staged word misrecognitions) and emotionality (hyper-expressive interjections) as conversation-internal factors that might influence participants’ intelligibility adjustments in Alexa- and human-directed speech (DS). Overall, we find speech style differences: Alexa-DS has a decreased speech rate, higher mean f0, and greater f0 variation than human-DS. In speech produced toward both interlocutors, adjustments in response to misrecognition were similar: participants produced more distinct vowel backing (enhancing the contrast between the target word and misrecognition) in target words and louder, slower, higher mean f0, and higher f0 variation at the sentence-level. No differences were observed in human- and Alexa-DS following displays of emotional expressiveness by the interlocutors. Expressiveness, furthermore, did not mediate intelligibility adjustments in response to a misrecognition. Taken together, these findings support proposals that speakers presume voice-AI has a “communicative barrier” (relative to human interlocutors), but that speakers adapt to conversational-internal factors of intelligibility similarly in human- and Alexa-DS. This work contributes to our understanding of human-computer interaction, as well as theories of speech style adaptation.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Cohn

Zellou

2021

Front. Commun.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Conversely, a recent study of the same phenomenon in spontaneous speech found that early Cantonese-English bilinguals were less likely to release final stops in English than non-Cantonese-English bilinguals [20]. These conflicting outcomes simply illustrate the need to examine variation in speech across styles and registers, as this variation has maximum utility for ASR systems and the development of NLP tools for speech and language, given how little is know about how talkers interact with such systems [21].…”

Section: Introductionmentioning

confidence: 99%

Sound change in spontaneous bilingual speech: A corpus study on the Cantonese n-l merger in Cantonese-English bilinguals

Soo¹,

Johnson²,

Babel³

2021

Preprint

View full text Add to dashboard Cite

In Cantonese and several other Chinese languages, /n/ is merging with /l/. The Cantonese merger appears categorical, with /n/ becoming /l/ word-initially. This project aims to describe the status of /n/ and /l/ in bilingual Cantonese and English speech to better understand individual differences at the interface of crosslinguistic influence and sound change. We examine bilingual speech using the SpiCE corpus, composed of speech from 34 early Cantonese-English bilinguals. Acoustic measures were collected on pre-vocalic nasal and lateral onsets in both languages. If bilinguals maintain separate representations for corresponding segments across languages, smaller differences between /n/ and /l/ are predicted in Cantonese compared to English. Measures of mid-frequency spectral tilt suggest that the /n/ and /l/ contrast is robustly maintained in English, but not Cantonese. The spacing of F2-F1 suggests small differences between Cantonese /n/ and /l/, and robust differences in English. While cross-language categories appear independent, substantial individual differences exist in the data. These data contribute to the understanding of the /n/ and /l/ merger in Cantonese and other Chinese languages, in addition to providing empirical and theoretical insights into crosslinguistic influence in early bilinguals.

show abstract

Towards eMCO/SiPO: A Human Factors Efficacy, Usability, and Safety Assessment for Direct Voice Input (DVI) Implementation in the Flight Deck

Ziakkas

Harris

Pechlivanis

2023

Engineering Psychology and Cognitive Ergonomics

View full text Add to dashboard Cite

Age- and Gender-Related Differences in Speech Alignment Toward Humans and Voice-AI

Cited by 29 publications

References 51 publications

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Sound change in spontaneous bilingual speech: A corpus study on the Cantonese n-l merger in Cantonese-English bilinguals

Towards eMCO/SiPO: A Human Factors Efficacy, Usability, and Safety Assessment for Direct Voice Input (DVI) Implementation in the Flight Deck

Contact Info

Product

Resources

About