Speech Synthesis and Uncanny Valley

Romportl, Jan

doi:10.1007/978-3-319-10816-2_72

Cited by 13 publications

(8 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our non-directional Hypotheses 2a-b, stating that there would be significant group differences in pleasantness and eeriness ratings between the voices, found support in such a way that more human-like voices were experienced as significantly more pleasant and less eerie than more mechanical sounding voices. This is in agreement with prior empirical studies that also observed positive effects of anthropomorphic design features (Romportl, 2014;Baird et al, 2018;Kühne et al, 2020;Roesler et al, 2021). At the same time, it seems to contradict the Uncanny Valley hypothesis (Mori, 1970) according to which we would have expected either the quite realistic yet not perfect voices Synthetic I or II receiving the highest eeriness ratings or alternatively-assuming categorical conflicts as an important mechanism behind uncanny experiences-the real human voice (given that participants were told they were listening to a robot).…”

Section: Discussionsupporting

confidence: 93%

“…Anecdotal evidence from two other exploratory studies suggests similar patterns Baird et al (2018) asked 25 listeners to evaluate the likability and human-likeness of 13 synthesized male voices and found likability to increase consistently with human-likeness. Based on data from 30 listeners, also Romportl (2014) reported that most though not all participants preferred a more natural female voice over an artificial sounding one. These results are also in line with two recent meta-analyses that overall show beneficial effects of-here, mostly visual-anthropomorphic design features for embodied robots and chatbots (e.g., on affect, attitudes, trust, or intention to use), although the dependence of these effects on various moderators (e.g., robot type, task type, and field of application) points to more complex relationships between human-likeness and user responses (Blut et al, 2021;Roesler et al, 2021).…”

Section: Human-like Voice As Anthropomorphic Cuementioning

confidence: 99%

“…In summary, given some recent empirical findings on synthetic speech, it could be assumed that voices that are perceived as more human-like are also perceived as more pleasant and less eerie (Romportl, 2014;Baird et al, 2018;Kühne et al, 2020). Against the background of the Uncanny Valley phenomenon, however, expectations would go in a different direction: On the one hand, it could be assumed that highly realistically sounding voices are evaluated as eerier and less pleasant than either a perfect imitation of the human voice or mechanically sounding voices.…”

Section: Human-like Voice As Anthropomorphic Cuementioning

confidence: 99%

See 2 more Smart Citations

Robot Voices in Daily Life: Vocal Human-Likeness and Application Context as Determinants of User Acceptance

Schreibelmayr

Mara

2022

Front. Psychol.

View full text Add to dashboard Cite

The growing popularity of speech interfaces goes hand in hand with the creation of synthetic voices that sound ever more human. Previous research has been inconclusive about whether anthropomorphic design features of machines are more likely to be associated with positive user responses or, conversely, with uncanny experiences. To avoid detrimental effects of synthetic voice design, it is therefore crucial to explore what level of human realism human interactors prefer and whether their evaluations may vary across different domains of application. In a randomized laboratory experiment, 165 participants listened to one of five female-sounding robot voices, each with a different degree of human realism. We assessed how much participants anthropomorphized the voice (by subjective human-likeness ratings, a name-giving task and an imagination task), how pleasant and how eerie they found it, and to what extent they would accept its use in various domains. Additionally, participants completed Big Five personality measures and a tolerance of ambiguity scale. Our results indicate a positive relationship between human-likeness and user acceptance, with the most realistic sounding voice scoring highest in pleasantness and lowest in eeriness. Participants were also more likely to assign real human names to the voice (e.g., “Julia” instead of “T380”) if it sounded more realistic. In terms of application context, participants overall indicated lower acceptance of the use of speech interfaces in social domains (care, companionship) than in others (e.g., information & navigation), though the most human-like voice was rated significantly more acceptable in social applications than the remaining four. While most personality factors did not prove influential, openness to experience was found to moderate the relationship between voice type and user acceptance such that individuals with higher openness scores rated the most human-like voice even more positively. Study results are discussed in the light of the presented theory and in relation to open research questions in the field of synthetic voice design.

show abstract

Section: Discussionsupporting

confidence: 93%

Section: Human-like Voice As Anthropomorphic Cuementioning

confidence: 99%

Section: Human-like Voice As Anthropomorphic Cuementioning

confidence: 99%

See 1 more Smart Citation

Robot Voices in Daily Life: Vocal Human-Likeness and Application Context as Determinants of User Acceptance

Schreibelmayr

Mara

2022

Front. Psychol.

View full text Add to dashboard Cite

show abstract

“…While a whole range of different attributes were found to be associated with the "uncanny valley" in visual tasks, comparatively little is known about the impact of synthesized voice qualities on likability and eeriness (Kuratate et al, 2009;Mitchell et al, 2011;Romportl, 2014;Chang et al, 2018). For instance, a very recent study evaluated the likability and humanlikeness of a corpus of 13 German male voices, produced via five different synthesis approaches, and found that contrary to the visual "uncanny valley, " likability increases monotonically with human-likeness of the voice (Baird et al, 2018).…”

Section: Visual Perception Of Humanoidsmentioning

confidence: 99%

“…For instance, a very recent study evaluated the likability and humanlikeness of a corpus of 13 German male voices, produced via five different synthesis approaches, and found that contrary to the visual "uncanny valley, " likability increases monotonically with human-likeness of the voice (Baird et al, 2018). A study by Romportl (2014) showed that about three quarters of participants preferred a more natural voice over an artificial one. However, the authors added that affinity to artificial agents might be an important intervening factor in voice perception.…”

Section: Visual Perception Of Humanoidsmentioning

confidence: 99%

The Human Takes It All: Humanlike Synthesized Voices Are Perceived as Less Eerie and More Likable. Evidence From a Subjective Ratings Study

2020

View full text Add to dashboard Cite

Background: The increasing involvement of social robots in human lives raises the question as to how humans perceive social robots. Little is known about human perception of synthesized voices.Aim: To investigate which synthesized voice parameters predict the speaker's eeriness and voice likability; to determine if individual listener characteristics (e.g., personality, attitude toward robots, age) influence synthesized voice evaluations; and to explore which paralinguistic features subjectively distinguish humans from robots/artificial agents.Methods: 95 adults (62 females) listened to randomly presented audio-clips of three categories: synthesized (Watson, IBM), humanoid (robot Sophia, Hanson Robotics), and human voices (five clips/category). Voices were rated on intelligibility, prosody, trustworthiness, confidence, enthusiasm, pleasantness, human-likeness, likability, and naturalness. Speakers were rated on appeal, credibility, human-likeness, and eeriness. Participants' personality traits, attitudes to robots, and demographics were obtained.Results: The human voice and human speaker characteristics received reliably higher scores on all dimensions except for eeriness. Synthesized voice ratings were positively related to participants' agreeableness and neuroticism. Females rated synthesized voices more positively on most dimensions. Surprisingly, interest in social robots and attitudes toward robots played almost no role in voice evaluation. Contrary to the expectations of an uncanny valley, when the ratings of human-likeness for both the voice and the speaker characteristics were higher, they seemed less eerie to the participants. Moreover, when the speaker's voice was more humanlike, it was more liked by the participants. This latter point was only applicable to one of the synthesized voices. Finally, pleasantness and trustworthiness of the synthesized voice predicted the likability of the speaker's voice. Qualitative content analysis identified intonation, sound, emotion, and imageability/embodiment as diagnostic features.Discussion: Humans clearly prefer human voices, but manipulating diagnostic speech features might increase acceptance of synthesized voices and thereby support human-robot interaction. There is limited evidence that human-likeness of a voice is negatively linked to the perceived eeriness of the speaker.

show abstract

Comparing the User Preferences Towards Emotional Voice Interaction Applied on Different Devices: An Empirical Study

Liao

Zhang

Wang

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Speech Synthesis and Uncanny Valley

Cited by 13 publications

References 3 publications

Robot Voices in Daily Life: Vocal Human-Likeness and Application Context as Determinants of User Acceptance

Robot Voices in Daily Life: Vocal Human-Likeness and Application Context as Determinants of User Acceptance

The Human Takes It All: Humanlike Synthesized Voices Are Perceived as Less Eerie and More Likable. Evidence From a Subjective Ratings Study

Comparing the User Preferences Towards Emotional Voice Interaction Applied on Different Devices: An Empirical Study

Contact Info

Product

Resources

About