Improving Prosodic Break Detection in a Russian TTS System

Chistikov, Pavel; Khomitsevich, Olga

doi:10.1007/978-3-319-01931-4_24

Cited by 12 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This classifier was used in our previous work on predicting breaks in Russian. We compared this classifier to the CART classifier and found it to give a (slightly) more accurate result and to be more flexible and easier to tune than CART (see [8] for further details). Its disadvantage compared to CART is that it cannot be efficiently used for predicting break duration, since only break/nonbreak decisions are made; in this paper, we deal only with break positions and not break durations.…”

Section: The Random Forests Classifiermentioning

confidence: 99%

Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

Khomitsevich

Chistikov

Zakharov³

2014

Speech and Computer

Self Cite

View full text Add to dashboard Cite

Abstract. In this paper we present a system for automatically predicting prosodic breaks in synthesized speech using the Random Forests classifier. In our experiments the classifier is trained on a large dataset consisting of audiobooks, which is automatically labeled with phone, word, and pause labels. To provide part of speech (POS) tags in the text, a rule-based POS tagger is used. We use crossvalidation in order to be able to examine not only the results for a specific subset of data but also the systems reliability across the dataset. The experimental results demonstrate that the system shows good and consistent results on the audiobook database; the results are poorer and less robust on a smaller database of read speech even though part of that database was labeled manually.

show abstract

Section: The Random Forests Classifiermentioning

confidence: 99%

Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

Khomitsevich

Chistikov

Zakharov³

2014

Speech and Computer

Self Cite

View full text Add to dashboard Cite

show abstract

“…To perform this we need a source audiobook (or similar) database comprising more than 20 hours of speech, where each sound file has a corresponding label file containing information about its speech elements [2]. First of all, linguistic and acoustic features [3,8] are calculated for the audiobook database. Then, voice model building is performed.…”

Section: The Proposed Systemmentioning

confidence: 99%

“…Then linguistic and prosodic features are calculated [3,8]. At the next step, HMM prototypes for each speech element in the database are created.…”

Section: Database Preprocessingmentioning

confidence: 99%

Improving Speech Synthesis Quality for Voices Created from an Audiobook Database

Chistikov

Zakharov²,

Talanov³

2014

Speech and Computer

Self Cite

View full text Add to dashboard Cite

Abstract. This paper describes an approach to improving synthesized speech quality for voices created by using an audiobook database. The data consist of a large amount of read speech by one speaker, which we matched with the corresponding book texts. The main problems with such a database are the following. First, the recordings were made at different times under different acoustic conditions, and the speaker reads the text with a variety of intonations and accents, which leads to very high voice parameter variability. Second, automatic techniques for sound file labeling make more errors due to the large variability of the database, especially as there can be mismatches between the text and the corresponding sound files. These problems dramatically affect speech synthesis quality, so a robust method for solving them is vital for voices created using audiobooks. The approach described in the paper is based on statistical models of voice parameters and special algorithms of speech element concatenation and modification. Listening tests show that it strongly improves synthesized speech quality.

show abstract

“…• sum of products of multiplication of the subband number (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12) and place in the ranking of the subbands energy, in descending order (subb rank score, 82);…”

Section: Frequencies In Subbandsmentioning

confidence: 99%

Detection of Sentence Boundaries in Polish Based on Acoustic Cues

Igras

Ziółko

2016

Archives of Acoustics

View full text Add to dashboard Cite

In this article the authors investigated and presented the experiments on the sentence boundaries annotation from Polish speech using acoustic cues as a source of information. The main result of the investigation is an algorithm for detection of the syntactic boundaries appearing in the places of punctuation marks. In the first stage, the algorithm detects pauses and divides a speech signal into segments. In the second stage, it verifies the configuration of acoustic features and puts hypotheses of the positions of punctuation marks. Classification is performed with parameters describing phone duration and energy, speaking rate, fundamental frequency contours and frequency bands. The best results were achieved for Naive Bayes classifier. The efficiency of the algorithm is 52% precision and 98% recall. Another significant outcome of the research is statistical models of acoustic cues correlated with punctuation in spoken Polish.

show abstract

Improving Prosodic Break Detection in a Russian TTS System

Cited by 12 publications

References 6 publications

Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

Improving Speech Synthesis Quality for Voices Created from an Audiobook Database

Detection of Sentence Boundaries in Polish Based on Acoustic Cues

Contact Info

Product

Resources

About