Bela Usabaev scite author profile

Bela Usabaev

4Publications

43Citation Statements Received

76Citation Statements Given

How they've been cited

How they cite others

Affiliations

Film University Babelsberg, University of Tübingen

Publications

Order By: Most citations

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

Yamagishi

Usabaev

King

et al. 2010

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an "average voice model" plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack phonetic balance. This enables us to consider building high-quality voices on "non-TTS" corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper, we demonstrate the thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal (WSJ0, WSJ1, and WSJCAM0), Resource Management, Globalphone, and SPEECON databases.

show abstract

Thousands of voices for HMM-based speech synthesis

Yamagishi¹,

Usabaev²,

King³

et al. 2009

View full text Add to dashboard Cite

The Virtual Theremin: Designing an Interactive Digital Music Instrument for Film Scene Scoring

Usabaev

Eschenbacher

Brennecke

2022

View full text Add to dashboard Cite

This paper presents a first prototype of a virtual Theremin instrument for accompanying film scenes with sound. The virtual Theremin is implemented as a hybrid application for the web. Sound control is achieved by capturing user gestures with a webcam and mapping the gestures to the corresponding virtual Theremin parameters pitch and volume. Different sound types can be selected. The application’s underlying research is part of the multi-modal digital heritage project KOLLISIONEN which targets to open up the private archive of the Russian film maker Sergej Eisenstein to a broader public in digital form. Eisenstein, a film theorist and pioneer of film montage, was particularly intrigued by the Theremin as an instrument for film sound design. The virtual Theremin presented here is therefore linked to a film scene from the 1929 Soviet drama “The General Line” by Sergej Eisenstein which was never set to music originally. In its first implementation state, the application connects music interaction design with digital heritage in a modular, flexible and playful way and uses contemporary web technologies to enable easy operation and the greatest possible accessibility.

show abstract

Roles of the average voice in speaker-adaptive HMM-based speech synthesis

Yamagishi¹,

Watts²,

King³

et al. 2010

View full text Add to dashboard Cite

In speaker-adaptive HMM-based speech synthesis, there are typically a few speakers for which the output synthetic speech sounds worse than that of other speakers, despite having the same amount of adaptation data from within the same corpus. This paper investigates these fluctuations in quality and concludes that as mel-cepstral distance from the average voice becomes larger, the MOS naturalness scores generally become worse. Although this negative correlation is not that strong, it suggests a way to improve the training and adaptation strategies. We also draw comparisons between our findings and the work of other researchers regarding "vocal attractiveness."

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bela Usabaev

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

Thousands of voices for HMM-based speech synthesis

The Virtual Theremin: Designing an Interactive Digital Music Instrument for Film Scene Scoring

Roles of the average voice in speaker-adaptive HMM-based speech synthesis

Contact Info

Product

Resources

About