Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1762
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

Abstract: Articulatory information has been shown to be effective in improving the performance of hidden Markov model (HMM)based text-to-speech (TTS) synthesis. Recently, deep learningbased TTS has outperformed HMM-based approaches. However, articulatory information has rarely been integrated in deep learning-based TTS. This paper investigated the effectiveness of integrating articulatory movement data to deep learning-based TTS. The integration of articulatory information was achieved in two ways: (1) direct integratio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
14
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 12 publications
(14 citation statements)
references
References 27 publications
0
14
0
Order By: Relevance
“…A direct neural speech decoding approach may improve efficacy by providing a faster communication rate than the current BCIs. In this framework, once the imaginedor intended-speech is generated internally, these signals are then decoded to text or speech parameters, and then a text-to-speech synthesizer (Cao et al, 2017) or a vocoder (Akbari et al, 2019) can be used to construct speech immediately.…”
Section: Introductionmentioning
confidence: 99%
“…A direct neural speech decoding approach may improve efficacy by providing a faster communication rate than the current BCIs. In this framework, once the imaginedor intended-speech is generated internally, these signals are then decoded to text or speech parameters, and then a text-to-speech synthesizer (Cao et al, 2017) or a vocoder (Akbari et al, 2019) can be used to construct speech immediately.…”
Section: Introductionmentioning
confidence: 99%
“…Speech is produced as a result of temporal overlap of articulatory gestures namely, lips, tongue tip, tongue body, tongue dorsum, velum, and larynx, which regulate constriction in different parts of the vocal tract [1]. Knowledge of articulatory kinematics together with acoustic information have shown benefit in various applications like, speech recognition [2,3], speech synthesis [4,5], speaker verification [6] and multimedia applications [7,8,9]. With the advancements in deep learning techniques, articulatory information has also shown success in silent speech interfaces (benefit patients who have lost their voice due to laryngectomy or diseases affecting the vocal folds) such as in speech recognition [10] and speech synthesis directly from articulatory position information alone [11,12].…”
Section: Introductionmentioning
confidence: 99%
“…Text-to-speech synthesis (TTS) then plays synthesized sounds based on the recognized text, which is well studied and is ready for this application (e.g., [15]). Researchers on TTS are currently exploring how to restore the laryngectomee’s own voice [5, 55] with limited training data. Thus, the core problem in current SSI research is developing effective algorithms of silent speech recognition (SSR) that map articulatory movements to text.…”
Section: Introductionmentioning
confidence: 99%