Phonetic Segmentation Using Knowledge from Visual and Perceptual Domain

Vachhani, Bhavik; Bhat, Chitralekha; Kopparapu, Sunil Kumar

doi:10.1007/978-3-319-64206-2_44

Cited by 1 publication

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The methods for boundary detection can be based on using bidirectional LSTM networks, 39,40 wavelet analysis, [42][43][44] graph-based structural analysis, 45 rules describing the power spectrum 46 or formants 47 and various features extracted from the spectrogram, for example, visual features 48,49 or auditory attention features. 50 The methods for boundary detection also have a relevant application in the task of segmentation with orthographic or phonetic transcription provided, where they can be used as additional boundary correction procedures. 51 A common system for speech segmentation is language-dependent, that is, it is trained and run on the same language.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure

Hanzlíček,

Matoušek,

Vít

2023

Computational Intelligence

View full text Add to dashboard Cite

This article describes experiments on speech segmentation using long short‐term memory recurrent neural networks. The main part of the paper deals with multi‐lingual and cross‐lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi‐lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well‐tuned hidden Markov model‐based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text‐to‐speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.

show abstract

Section: Related Workmentioning

confidence: 99%

“…The methods for boundary detection can be based on using bidirectional LSTM networks, 39,40 wavelet analysis, 42‐44 graph‐based structural analysis, 45 rules describing the power spectrum 46 or formants 47 and various features extracted from the spectrogram, for example, visual features 48,49 or auditory attention features 50 …”

Section: Introductionmentioning

confidence: 99%