Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1706
|View full text |Cite
|
Sign up to set email alerts
|

Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model

Abstract: In the speech synthesis systems, the phrase break (PB) prediction is the first and most important step. Recently, the state-of-the-art PB prediction systems mainly rely on word embeddings. However this method is not fully applicable to Mongolian language, because its word embeddings are inadequate trained, owing to the lack of resources. In this paper, we introduce a bidirectional Long Short Term Memory (BiLSTM) model which combined word embeddings with syllable and morphological embedding representations to p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
2

Relationship

4
5

Authors

Journals

citations
Cited by 19 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…Second, the nature of the specific model. Unlike traditional RNN-based sequence labeling models [62,79,80] that capture sequential context, self-attention sublayer in Figure 1 connects two arbitrary words directly regardless of their distance [53,58]. Furthermore, the recurrent sublayer in Figure 1 well captures the long-range sequential dependency, therefore, the self-attention layer in Figure 1 doesn't rely on an output layer to model the phrase break labeling sequence for decision making.…”
Section: Ablation Testsmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, the nature of the specific model. Unlike traditional RNN-based sequence labeling models [62,79,80] that capture sequential context, self-attention sublayer in Figure 1 connects two arbitrary words directly regardless of their distance [53,58]. Furthermore, the recurrent sublayer in Figure 1 well captures the long-range sequential dependency, therefore, the self-attention layer in Figure 1 doesn't rely on an output layer to model the phrase break labeling sequence for decision making.…”
Section: Ablation Testsmentioning
confidence: 99%
“…We adopt a self-attention neural classifier, which handles long range dependency of words better than RNN [52]. This work is an extension to our previous work [62] with several novel contributions,…”
Section: Introductionmentioning
confidence: 99%
“…For Chinese, we use the Tencent AI Lab embedding database for Chinese Words and Phrases [41]. For Mongolian, the pre-trained 200-dimension word embedding reported in [42] is used.…”
Section: Experiments a Databasesmentioning
confidence: 99%
“…In these techniques, the key idea is to integrate the conventional TTS pipeline into a unified encoder-decoder network and to learn the mapping directly from the <text, wav> pair. Tacotron is a successful encoder-decoder implementation based on recurrent neural networks (RNN), such as LSTM [11,12] and GRU [13]. However, the recurrent nature inherently limits the possibility of parallel computing in both training and inference.…”
Section: Introductionmentioning
confidence: 99%