Jana Voße scite author profile

Speech synthesis applications have become an ubiquity, in navigation systems, digital assistants or as screen or audio book readers. Despite their impact on the acceptability of the systems in which they are embedded, and despite the fact that different applications probably need different types of TTS voices, TTS evaluation is still largely treated as an isolated problem. Even though there is strong agreement among researchers that the mainstream approaches to Text-to-Speech (TTS) evaluation are often insufficient and may even be misleading, there exist few clear-cut suggestions as to (1) how TTS evaluations may be realistically improved on a large scale, and (2) how such improvements may lead to an informed feedback for system developers and, ultimately, better systems relying on TTS. This paper reviews the current state-of-the-art in TTS evaluation, and suggests a novel user-centered research program for this area.

show abstract

Investigating the phonetic expression of successful motivation

Voße¹,

Wagner²

2019

View full text Add to dashboard Cite

The present study provides a comprehensive acoustic phonetic analysis of motivational speech by collecting, annotating and processing 50 minutes of speech data representing less and more successful degrees of motivation. The analysis shows significant differences regarding the acoustic phonetic features f 0 (median, range, variation), intensity (median, range) and speaking rate. We observe inconsistent results for the variation of intensity, pointing to the necessity of a more fine-grained analysis of this feature. This study provides first support for the existence of a specific motivational speaking style.

show abstract

How to motivate with speech. Findings from acoustic phonetics and pragmatics

2022

View full text Add to dashboard Cite

In the present work, we describe and discuss two studies in the field of motivating-speech research. The studies investigate voice-quality features (study 1) and pragmatic aspects (study 2) in German motivating speech, thereby adding to the current state-of-the-art in understanding motivating speech and language1. We find indications that a low amount of breathiness, a more periodic signal and a balanced distribution of specific pragmatic elements contribute to a motivating impact in German.

show abstract

Increasing Recall of Lengthening Detection via Semi-Automatic Classification

Betz

Voße²,

Zarrieß

et al. 2017

View full text Add to dashboard Cite

Lengthening is the ideal hesitation strategy for synthetic speech and dialogue systems: it is unobtrusive and hard to notice, because it occurs frequently in everyday speech before phrase boundaries, in accentuation, and in hesitation. Despite its elusiveness, it allows valuable extra time for computing or information highlighting in incremental spoken dialogue systems. The elusiveness of the matter, however, poses a challenge for extracting lengthening instances from corpus data: we suspect a recall problem, as human annotators might not be able to consistently label lengthening instances. We address this issue by filtering corpus data for instances of lengthening, using a simple classification method, based on a threshold for normalized phone duration. The output is then manually labeled for disfluency. This is compared to an existing, fully manual disfluency annotation, showing that recall is significantly higher with semiautomatic pre-classification. This shows that it is inevitable to use semi-automatic pre-selection to gather enough candidate data points for manual annotation and subsequent lengthening analyses. Also, it is desirable to further increase the performance of the automatic classification. We evaluate in detail human versus semi-automatic annotation and train another classifier on the resulting dataset to check the integrity of the disfluent -non-disfluent distinction.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jana Voße

What makes a charismatic speaker? A computer-based acoustic-prosodic analysis of Steve Jobs tone of voice

Speech Synthesis Evaluation — State-of-the-Art Assessment and Suggestion for a Novel Research Program

Investigating the phonetic expression of successful motivation

How to motivate with speech. Findings from acoustic phonetics and pragmatics

Increasing Recall of Lengthening Detection via Semi-Automatic Classification

Contact Info

Product

Resources

About