2019
DOI: 10.3390/biomimetics4020039
|View full text |Cite
|
Sign up to set email alerts
|

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Abstract: Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 41 publications
(43 reference statements)
0
8
0
1
Order By: Relevance
“…The parameters are processed independently, as proposed in previous references [ 11 ], and after the parametrization, we separate the parameters in voiced (with a value of ) and unvoiced (with a value of according to the Ahocoder parametrization), both in the synthesized and natural utterances. The reason of this discrimination is that voiced/unvoiced is one of the most distinctive features of the speech sounds, reflected from the source filter model of speech production [ 27 ].…”
Section: Proposed Systemmentioning
confidence: 99%
See 3 more Smart Citations
“…The parameters are processed independently, as proposed in previous references [ 11 ], and after the parametrization, we separate the parameters in voiced (with a value of ) and unvoiced (with a value of according to the Ahocoder parametrization), both in the synthesized and natural utterances. The reason of this discrimination is that voiced/unvoiced is one of the most distinctive features of the speech sounds, reflected from the source filter model of speech production [ 27 ].…”
Section: Proposed Systemmentioning
confidence: 99%
“…To improve the results obtained with this technique, some researchers have implemented postfilters, by adding algorithms as a final step to enhance the quality of the sound. Some algorithms implemented are deep generative architectures [10], Restricted Boltzmann Machines, and Long Short-term Memory (LSTM) [11].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…En contextos aplicados, las divisiones temáticas dentro de la inteligencia artificial han tenido conexiones claras con la ingeniería eléctrica, tales como los sistemas expertos en robótica (Sanders, Graham-Jones y Gegov, 2010), redes neuronales artificiales (Coto-Jiménez, 2019;Ekonomou, 2010) y algoritmos evolutivos (Chan, Lee, Sudhoff y Zivi, 2008).…”
Section: Sobre Los Contenidos Temáticos a Considerarunclassified