Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Cohn, Michelle; Zellou, Georgia

doi:10.3389/fcomm.2021.675704

Cited by 15 publications

(9 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speakers produce 'clear speech' when there is a reason to believe their listener will have trouble comprehending the signal. Clear speech is characterized by a variety of acoustic modifications relative to casual or conversational speech, such as slowing their speaking rate and producing more extreme segmental articulations (Picheny et al, 1986;Krause & Braida, 2002;Uchanski, 2005;Smiljanić & Bradlow, 2009;Dilley et al, 2014;Cohn & Zellou, 2021;Cohn et al, 2022). Speaking clearly has repeatedly been shown to benefit listeners by increasing intelligibility (e.g.…”

Section: A Clear Speechmentioning

confidence: 99%

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Zellou

Lahrouchi

Bensoukas

2022

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

Tashlhiyt Berber is known for having typologically unusual word-initial phonological contrasts, specifically, word-initial singleton-geminate minimal pairs (e.g., sin vs ssin) and sequences of consonants that violate the sonority sequencing principle (e.g., non-rising sonority sequences: fsin). The current study investigates the role of a listener-oriented speaking style on the perceptual enhancement of these rarer phonological contrasts. It examines the perception of word-initial singleton, geminate, and complex onsets in Tashlhiyt Berber across clear and casual speaking styles by native and naive listeners. While clear speech boosts the discriminability of pairs containing singleton-initial words for both listener groups, only native listeners performed better in discriminating between initial singleton-geminate contrasts in clear speech. Clear speech did not improve perception for lexical contrasts containing a non-rising-sonority consonant cluster for either listener group. These results are discussed in terms of how clear speech can inform phonological typology and the role of phonetic enhancement in language-universal vs language-specific speech perception.

show abstract

Section: A Clear Speechmentioning

confidence: 99%

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Zellou

Lahrouchi

Bensoukas

2022

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the current study, a routinization prediction would be a consistent distinction for speech features in human- and technology-DS, such as those paralleling increased vocal effort in response to a communicative barrier (increased duration, pitch, and intensity in technology-DS). As mentioned, prior studies have found adults’ technology register adjustments are often louder 15 , 19 , 20 , have longer productions/slower rate 10 , 17 , 18 , 44 , and have differences in pitch 15 , 18 , 19 , 23 , 44 from human-directed registers. Furthermore, a routinization prediction would be that, given their different experiences with systems, adults and children will vary in their device and human-directed registers.…”

Section: Introductionmentioning

confidence: 84%

“…When talking to technology, adults often make their speech louder and slower 15 ; this is true cross-linguistically, including for voice assistants in English 15 – 18 and German 19 , 20 , a robot in Swedish 21 , and computer avatar in English 10 , and it is consistent with the claim that people conceptualize technological agents as less communicatively competent than human interlocutors 11 , 15 , 22 . In some cases, English and French speakers also make their speech higher pitched when talking to another person compared to a voice assistant 17 or robot 23 , respectively. Taken together, the adjustments observed in technology-DS often parallel those made in challenging listening conditions; in the presence of background noise, speakers produce louder, slower, and higher pitched speech 24 , 25 .…”

Section: Introductionmentioning

confidence: 99%

Children and adults produce distinct technology- and human-directed speech

Cohn,

Barreda,

Graf Estes

et al. 2024

Sci Rep

View full text Add to dashboard Cite

This study compares how English-speaking adults and children from the United States adapt their speech when talking to a real person and a smart speaker (Amazon Alexa) in a psycholinguistic experiment. Overall, participants produced more effortful speech when talking to a device (longer duration and higher pitch). These differences also varied by age: children produced even higher pitch in device-directed speech, suggesting a stronger expectation to be misunderstood by the system. In support of this, we see that after a staged recognition error by the device, children increased pitch even more. Furthermore, both adults and children displayed the same degree of variation in their responses for whether “Alexa seems like a real person or not”, further indicating that children’s conceptualization of the system’s competence shaped their register adjustments, rather than an increased anthropomorphism response. This work speaks to models on the mechanisms underlying speech production, and human–computer interaction frameworks, providing support for routinized theories of spoken interaction with technology.

show abstract

“…One limitation of the current study is that the speech samples were not elicited as device-directed speech. Prior work has observed that speakers make distinct clear speech adjustments when talking to ASR-enabled devices, like smartphones and voice-AI assistants 56 , 57 , and adjust their pronunciations even more when the machine makes an error 39 . A ripe future direction is to explore whether cross-language ASR re-use recognition accuracy improves if the speakers are producing authentic device-directed speech.…”

Section: Discussionmentioning

confidence: 99%

Linguistic disparities in cross-language automatic speech recognition transfer from Arabic to Tashlhiyt

Zellou,

Lahrouchi

2024

Sci Rep

Self Cite

View full text Add to dashboard Cite

Tashlhiyt is a low-resource language with respect to acoustic databases, language corpora, and speech technology tools, such as Automatic Speech Recognition (ASR) systems. This study investigates whether a method of cross-language re-use of ASR is viable for Tashlhiyt from an existing commercially-available system built for Arabic. The source and target language in this case have similar phonological inventories, but Tashlhiyt permits typologically rare phonological patterns, including vowelless words, while Arabic does not. We find systematic disparities in ASR transfer performance (measured as word error rate (WER) and Levenshtein distance) for Tashlhiyt across word forms and speaking style variation. Overall, performance was worse for casual speaking modes across the board. In clear speech, performance was lower for vowelless than for voweled words. These results highlight systematic speaking mode- and phonotactic-disparities in cross-language ASR transfer. They also indicate that linguistically-informed approaches to ASR re-use can provide more effective ways to adapt existing speech technology tools for low resource languages, especially when they contain typologically rare structures. The study also speaks to issues of linguistic disparities in ASR and speech technology more broadly. It can also contribute to understanding the extent to which machines are similar to, or different from, humans in mapping the acoustic signal to discrete linguistic representations.

show abstract

Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility Adjustments

Cited by 15 publications

References 66 publications

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners

Children and adults produce distinct technology- and human-directed speech

Linguistic disparities in cross-language automatic speech recognition transfer from Arabic to Tashlhiyt

Contact Info

Product

Resources

About