2014 Twentieth National Conference on Communications (NCC) 2014
DOI: 10.1109/ncc.2014.6811273
|View full text |Cite
|
Sign up to set email alerts
|

Group delay based phone segmentation for HTS

Abstract: HMM based speech synthesis (HTS) is a state-ofthe art approach to text-to-speech synthesis. Segmentation of the training data is essential for building any text-to-speech system. Most conventional text-to-speech systems use phones as the basic unit of synthesis and use a speech recogniser to automatically segment the data at the phone level. As Indian languages are low resource languages, accurate transcriptions are difficult to obtain owing to paucity of data. Manual labeling at the phone level is not only la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…Embedded re-estimation is performed between a pair of conect boundaries. This results in better phone segmentation and syllable segmentation [30], [31]. Figure 2 illustrates the proposed approach.…”
Section: Hybrid Segmentationmentioning
confidence: 99%
“…Embedded re-estimation is performed between a pair of conect boundaries. This results in better phone segmentation and syllable segmentation [30], [31]. Figure 2 illustrates the proposed approach.…”
Section: Hybrid Segmentationmentioning
confidence: 99%
“…There are many composite aksharas in mrudangam, which makes the list of unique aksharas very large as shown in Table II 2) Onset Detection: Using onset detection, the starting of all aksharas are recognized. The onset detection is performed using the group delay algorithm [2] [15], where the akshara sequence signal is processed similar to that of syllables in 3) Feature Extraction: Mel-frequency cepstral co-efficients (MFCC) features are extracted using the HTK toolkit [16]. The frame size is 20 msec and the frame shift is 2 msec.…”
Section: A Mrudangammentioning
confidence: 99%