Daniel Tihelka scite author profile

2017

In this paper a classification-based method for the automatic detection of glottal closure instants (GCIs) from the speech signal is proposed. Peaks in the speech waveforms are taken as candidates for GCI placements. A classification framework is used to train a classification model and to classify whether or not a peak corresponds to the GCI. We show that the detection accuracy in terms of F 1 score is 97.27%. In addition, despite using the speech signal only, the proposed method behaves comparably to a method utilizing the glottal signal. The method is also compared with three existing GCI detection algorithms on publicly available databases.

Current State of Czech Text-to-Speech System ARTIC

Romportl

2006

Abstract. This paper gives a survey of the current state of ARTIC -the modern Czech concatenative corpus-based text-to-speech system. All stages of the system design are described in the paper, including the acoustic unit inventory building process, text processing and speech production issues. Two versions of the system are presented: the single unit instance system with the moderate output speech quality, suitable for low-resource devices, and the multiple unit instance system with a dynamic unit instance selection scheme, yielding the output speech of a high quality. Both versions make use of the automatically designed acoustic unit inventories. In order to assure the desired prosodic characteristics of the output speech, system-version-specific prosody generation issues are discussed here too. Although the system was primarily designed for synthesis of Czech speech, ARTIC can now speak three languages: Czech (both female and male voices are available), Slovak and German.

Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies

Hanzlíček

Jůzová

et al. 2018

Automatic dubbing of TV programmes for the hearing impaired

Hanzlíček

et al. 2010

Abstract-This paper presents experiments with a customisation of a corpus-based unit-selection text-to-speech (TTS) system for automatic dubbing of TV programmes. The project aims at people with hearing impairments as its main goal is to produce a highly intelligible, less-dynamic, and more-undisturbed audio track for TV programmes automatically from subtitles. A twophase synchronisation process was proposed to cope with audiovideo synchronisation issues. These phases include both off-line time compression of all utterances in a source speech corpus used for TTS and on-line time compression of speech that overlaps assigned subtitle time slots. Based on a case study, in which a TTS-generated audio track of a selected movie was analysed, a simplification of to-be-desynchronised subtitle texts was proposed in order to keep time-compression factors in a reasonable extent. In this way, abrupt changes in dynamics of the produced audio track are avoided.

Towards automatic audio track generation for Czech TV broadcasting: Initial experiments with subtitles-to-speech synthesis

Hanzlíček

2008