2016
DOI: 10.15388/informatica.2016.105
|View full text |Cite
|
Sign up to set email alerts
|

Corpus-Based Hidden Markov Modelling of the Fundamental Frequency of Lithuanian

Abstract: This paper presents the corpus-driven approach in building the computational model of fundamental frequency, or F 0 , for Lithuanian language. The model was obtained by training the HMM-based speech synthesis system HTS on six hours of speech coming from multiple speakers. Several gender specific models, using different parameters and different contextual factors, were investigated. The models were evaluated by synthesizing F 0 contours and by comparing them to the original F 0 contours using criteria of root … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…The availability of Lithuanian speech corpora for investigative purposes is satisfactory. Speech corpora represent Lithuanian acoustic space, usually are of about 20 hours of duration, precisely annotated by human at phonemic level, and usually comprise spoken words, phrases, syllables, names of cities or persons (Kazlauskienė and Raškinis, 2013;Vaičiūnas et al, 2016). A corpus of large extent would not usually have qualities, which could be obtained only by manual work, and this lack of quality results in a possible impairment of scientific investigations.…”
Section: Related Workmentioning
confidence: 99%
“…The availability of Lithuanian speech corpora for investigative purposes is satisfactory. Speech corpora represent Lithuanian acoustic space, usually are of about 20 hours of duration, precisely annotated by human at phonemic level, and usually comprise spoken words, phrases, syllables, names of cities or persons (Kazlauskienė and Raškinis, 2013;Vaičiūnas et al, 2016). A corpus of large extent would not usually have qualities, which could be obtained only by manual work, and this lack of quality results in a possible impairment of scientific investigations.…”
Section: Related Workmentioning
confidence: 99%
“…Despite the fact that duration models of Lithuanian sounds (Norkevičius and Raškinis, 2008), (Kasparaitis and Beniušė, 2016) and the intonation model of Lithuanian sentences (Vaičiūnas et al, 2016) have been developed in recent years, they will not be used in this work because only the phoneme-based synthesizer has the duration model implemented at the moment. Phonemes and diphones will be cut out of the recordings without any modifications.…”
Section: The Problem Of Missing Diphonesmentioning
confidence: 99%
“…The unit selection speech synthesis method still remains one of the most popular methods, although other methods are gaining popularity, e.g., hidden Markov models (HMM) (Tokuda et al, 2013) or deep neural networks (DNN), recently proposed by Google's DeepMind (van den Oord et al 2016) and Baidu (Arik et al, 2017). HMM method still has certain drawbacks, e. g. somewhat buzzy sound and over-smoothing, while DNN require huge computational power, so we decided to continue our research on well-proven unit selection method.…”
Section: Introductionmentioning
confidence: 99%