Group delay based phone segmentation for HTS

Shanmugam, S. Aswin; Murthy, Hema A.

doi:10.1109/ncc.2014.6811273

Cited by 7 publications

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Embedded re-estimation is performed between a pair of conect boundaries. This results in better phone segmentation and syllable segmentation [30], [31]. Figure 2 illustrates the proposed approach.…”

Section: Hybrid Segmentationmentioning

confidence: 99%

Building speech synthesis systems for Indian languages

Pradhan

Prakash

Shanmugam

et al. 2015

2015 Twenty First National Conference on Communications (NCC)

View full text Add to dashboard Cite

In this paper, new efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented. The synthesisers are built around both concatenative speech synthesis and statistical parametric speech synthesis frameworks. Text to speech synthesis systems require accurate segmentation. Obtaining accurate segmentation at the phone level is a difficult task. Manual segmentation leads to human errors, while automatic segmentation using statistical approaches (hidden Markov model based approaches) leads to poor boundary information, when the amount of data used for training is small.A group delay based syllable segmentation semi-automatic tool is discussed. The tool is semi-automatic as some of the boundaries obtained are inaccurate and have to be manually corrected. Next, a segmentation algorithm that uses both HMM based segmentation and group delay based segmentation, is used to obtain accurate boundaries automatically.The boundaries obtained are used in the syllable-based synthesiser for unit selection. In the statistical phone-based synthesiser, embedded re estimation is performed at the phone level. Syllable-based and penta-phone based HMMs are used for building the synthesiser. TTS systems for 12 different Indian languages namely Tamil, Hindi, Marathi, Malayalam, Telugu, Rajasthani, Bengali, Odia, Assamese, Ma nipuri, Kannada and Gujarati are built using semi-automatic segmen tation and synthesisers have been built for 7 Indian languages using automatic segmentation. Evaluation of the semi-automatic segmentation systems indicate that the MOS (mean opinion score) is above 3.0 for most of the languages. Pair comparison tests on semi-automatic vs. automatic segmentation show that automatic segmentation is preferred.

show abstract

Section: Hybrid Segmentationmentioning

confidence: 99%

Building speech synthesis systems for Indian languages

Pradhan

Prakash

Shanmugam

et al. 2015

2015 Twenty First National Conference on Communications (NCC)

View full text Add to dashboard Cite

show abstract

“…There are many composite aksharas in mrudangam, which makes the list of unique aksharas very large as shown in Table II 2) Onset Detection: Using onset detection, the starting of all aksharas are recognized. The onset detection is performed using the group delay algorithm [2] [15], where the akshara sequence signal is processed similar to that of syllables in 3) Feature Extraction: Mel-frequency cepstral co-efficients (MFCC) features are extracted using the HTK toolkit [16]. The frame size is 20 msec and the frame shift is 2 msec.…”

Section: A Mrudangammentioning

confidence: 99%

Akshara transcription of mrudangam strokes in Carnatic music

Kuriakose

Kumar

Padi

et al. 2015

2015 Twenty First National Conference on Communications (NCC)

View full text Add to dashboard Cite

Percussion instruments play a significant role in Carnatic music concerts. The percussion artist enjoys a great degree of freedom in improvising within the defined tāla structure of a composition. The objective of this paper is to transcribe the improvisations, treating the percussion strokes as syllables or aksharas. Onset detection is performed to segment the waveform at each akshara. Using the transcriptions from the training data, a three-state Hidden Markov Model is built for each akshara. The language model is derived from the training data. Testing is also performed isolated style using onset detection to segment the phrase, and the language model to correct the transcription. Transcription is performed on both concert recordings and studio recordings. This technique yields upto ≈ 96% accuracy on studio recordings and ≈ 76% accuracy for concert recordings. As the mrudangam 1 is an instrument that is based on tonic; tonic normalised features, namely, Cent Filterbank Cepstral coefficients are used. It is shown that tonic normalisation helps in transcription across different tonics.

show abstract

Phonetic–Acoustic Characteristics of Telugu Lateral Approximants

Maddela¹,

Bhaskararao

2022

Circuits Syst Signal Process

View full text Add to dashboard Cite

Group delay based phone segmentation for HTS

Cited by 7 publications

References 9 publications

Building speech synthesis systems for Indian languages

Building speech synthesis systems for Indian languages

Akshara transcription of mrudangam strokes in Carnatic music

Phonetic–Acoustic Characteristics of Telugu Lateral Approximants

Contact Info

Product

Resources

About