Jakub Vít scite author profile

Jakub Vít

5Publications

48Citation Statements Received

41Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of West Bohemia, Google (United States)

Publications

Order By: Most citations

Google’s Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders

Wan¹,

Agiomyrgiannakis²,

Silén³

et al. 2017

View full text Add to dashboard Cite

A neural network model that significant improves unitselection-based Text-To-Speech synthesis is presented. The model employs a sequence-to-sequence LSTM-based autoencoder that compresses the acoustic and linguistic features of each unit to a fixed-size vector referred to as an embedding. Unit-selection is facilitated by formulating the target cost as an L2 distance in the embedding space. In open-domain speech synthesis the method achieves a 0.2 improvement in the MOS, while for limited-domain it reaches the cap of 4.5 MOS. Furthermore, the new TTS system halves the gap between the previous unit-selection system and WaveNet in terms of quality while retaining low computational cost and latency.

show abstract

Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies

Tihelka

Hanzlíček

Jůzová

et al. 2018

View full text Add to dashboard Cite

Improving automatic dubbing with subtitle timing optimisation using video cut detection

Matoušek

Vít

2012

View full text Add to dashboard Cite

This paper presents improvements to an automatic dubbing system in which text-to-speech technology is used to synthesise speech from subtitles. Spring-based subtitle timing optimisation was proposed to reduce the need for speeding up synthetic speech to fit it into corresponding subtitle slots. Video cut detection algorithm was also introduced, and the cuts were then used to prevent stretching subtitles across the cuts. Results show that after the optimisation smaller speeding-up factors are applied on synthetic speech while keeping optimised subtitle start and end times close to original positions.

show abstract

Czech Speech Synthesis with Generative Neural Vocoder

Vít

Hanzlíček

Matoušek

2019

View full text Add to dashboard Cite

Unified Language-Independent DNN-Based G2P Converter

Jůzová

Tihelka

Vít

2019

View full text Add to dashboard Cite

We introduce a unified Grapheme-to-phoneme conversion framework based on the composition of deep neural networks. In contrary to the usual approaches building the G2P frameworks from the dictionary, we use whole phrases, which allows us to capture various language properties, e.g. crossword assimilation, without the need for any special care or topology adjustments. The evaluation is carried out on three different languages-English, Czech and Russian. Each requires dealing with specific properties, stressing the proposed framework in various ways. The very first results show promising performance of the proposed framework, dealing with all the phenomena specific to the tested languages. Thus, we consider the framework to be language-independent for a wide range of languages.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jakub Vít

Google’s Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders

Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies

Improving automatic dubbing with subtitle timing optimisation using video cut detection

Czech Speech Synthesis with Generative Neural Vocoder

Unified Language-Independent DNN-Based G2P Converter

Contact Info

Product

Resources

About