Acoustic-phonetic properties of Siri- and human-directed speech

Cohn, Michelle; Segedin, Bruno Ferenc; Zellou, Georgia

doi:10.1016/j.wocn.2021.101123

Cited by 31 publications

(40 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speakers produce 'clear speech' when there is a reason to believe their listener will have trouble comprehending the signal. Clear speech is characterized by a variety of acoustic modifications relative to casual or conversational speech, such as slowing their speaking rate and producing more extreme segmental articulations (Picheny et al, 1986;Krause & Braida, 2002;Uchanski, 2005;Smiljanić & Bradlow, 2009;Dilley et al, 2014;Cohn & Zellou, 2021;Cohn et al, 2022). Speaking clearly has repeatedly been shown to benefit listeners by increasing intelligibility (e.g.…”

Section: A Clear Speechmentioning

confidence: 99%

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Zellou

Lahrouchi

Bensoukas

2022

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

Tashlhiyt Berber is known for having typologically unusual word-initial phonological contrasts, specifically, word-initial singleton-geminate minimal pairs (e.g., sin vs ssin) and sequences of consonants that violate the sonority sequencing principle (e.g., non-rising sonority sequences: fsin). The current study investigates the role of a listener-oriented speaking style on the perceptual enhancement of these rarer phonological contrasts. It examines the perception of word-initial singleton, geminate, and complex onsets in Tashlhiyt Berber across clear and casual speaking styles by native and naive listeners. While clear speech boosts the discriminability of pairs containing singleton-initial words for both listener groups, only native listeners performed better in discriminating between initial singleton-geminate contrasts in clear speech. Clear speech did not improve perception for lexical contrasts containing a non-rising-sonority consonant cluster for either listener group. These results are discussed in terms of how clear speech can inform phonological typology and the role of phonetic enhancement in language-universal vs language-specific speech perception.

show abstract

Section: A Clear Speechmentioning

confidence: 99%

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Zellou

Lahrouchi

Bensoukas

2022

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

show abstract

“…We therefore vary whether the guise of the talker is congruent (shown an image of a device) or incongruent (shown an image of a human). If alignment is driven by functional reasons, we expect participants to align the most toward device-guise voices in an effort to communicate more effectively (Cowan et al, 2015;Cohn et al, 2022). Conversely, if alignment is driven by similarity attraction (Byrne, 1971), we might expect participants to align more toward human-guise voices (Gessinger et al, 2021).…”

Section: Current Studymentioning

confidence: 97%

“…They also found, however, that voice type overall (human or device) was not a significant predictor of alignment patterns, suggesting that the acoustic differences between human and TTS voices were not the main driver of differences in alignment patterns. Other work has shown that people have distinct expectations about the communicative competence of technology; for example, participants explicitly rate a TTS voice as less competent and less human-like than a human voice (Cohn et al, 2022) and more robotic TTS voices as less competent, relative to more human-like TTS voices (Cowan et al, 2015;Zellou et al, 2021a). Additionally, given the identical guise for a talker-cued by an image of a human or device silhouette -listeners show worse performance on a speech-in-noise task (Aoki et al, 2022).…”

Section: Introductionmentioning

confidence: 99%

Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise

Dodd

Cohn

Zellou

2023

Front. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

Text-to-speech (TTS) voices, which vary in their apparent native language and dialect, are increasingly widespread. In this paper, we test how speakers perceive and align toward TTS voices that represent American, British, and Indian dialects of English and the extent that social attitudes shape patterns of convergence and divergence. We also test whether top-down knowledge of the talker, manipulated as a “human” or “device” guise, mediates these attitudes and accommodation. Forty-six American English-speaking participants completed identical interactions with 6 talkers (2 from each dialect) and rated each talker on a variety of social factors. Accommodation was assessed with AXB perceptual similarity by a separate group of raters. Results show that speakers had the strongest positive social attitudes toward the Indian English voices and converged toward them more. Conversely, speakers rate the American English voices as less human-like and diverge from them. Finally, speakers overall show more accommodation toward TTS voices that were presented in a “human” guise. We discuss these results through the lens of the Communication Accommodation Theory (CAT).

show abstract

“…Interestingly, this prediction has also been extended to voice-AI (Uther et al, 2007): the basic idea is that listeners treat such voice-AI devices as if they require enhanced speech input. Recent findings have corroborated this prediction (Burnham et al, 2010;Cohn et al, , 2022. For example, examined the adjustments that speakers made in response to a misrecognition by a human or by voice-AI (Amazon's Alexa).…”

Section: Introductionmentioning

confidence: 93%

Interactions between voice-activated AI assistants and human speakers and their implications for second-language acquisition

Song

Pycha²,

Culleton³

2022

Front. Commun.

View full text Add to dashboard Cite

Voice-activated artificially intelligent (voice-AI) assistants, such as Alexa, are remarkably effective at processing spoken commands by native speakers. What happens when the command is produced by an L2 speaker? In the current study, we focused on Korean-speaking L2 learners of English, and we asked (a) whether Alexa could recognize intended productions of two vowel contrasts, /i/ vs. /ɪ/ and /æ/ vs. /ε/, that occur in English but not in Korean, and (b) whether L2 talkers would make clear-speech adjustments when Alexa misrecognized their intended productions. L2 talkers (n = 10) and native English (n = 10) controls asked Alexa to spell out words. Targets were words that formed minimal vowel pairs, e.g., beat-bit, pet-pat. Results showed that Alexa achieved a 55% accuracy rate with L2 productions, compared to 98% for native productions. When Alexa misrecognized an intended production (e.g., spelling P-E-T when the speaker intended pat), L2 talkers adjusted their subsequent production attempts by altering the duration, F1 and F2 of individual vowels (except for /ε/), as well as increasing vowel duration difference between contrasting vowels. These results have implications for theories of speech adaptation, and specifically for our understanding of L2 speech modifications oriented to voice-AI devices.

show abstract

Acoustic-phonetic properties of Siri- and human-directed speech

Cited by 31 publications

References 72 publications

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise

Interactions between voice-activated AI assistants and human speakers and their implications for second-language acquisition

Contact Info

Product

Resources

About