2022
DOI: 10.1016/j.specom.2022.01.003
|View full text |Cite
|
Sign up to set email alerts
|

Neural speech-rate conversion with multispeaker WaveNet vocoder

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 59 publications
0
10
0
Order By: Relevance
“…Future works may investigate the combined use of phoneme and syllable rates for speaking rate conversion which showed to better correlate with perceived tempo [25]. Also, alternative approaches using neural networks [36,37] may replace the WSOLA algorithm to reduce the amount of artifacts.…”
Section: Discussionmentioning
confidence: 99%
“…Future works may investigate the combined use of phoneme and syllable rates for speaking rate conversion which showed to better correlate with perceived tempo [25]. Also, alternative approaches using neural networks [36,37] may replace the WSOLA algorithm to reduce the amount of artifacts.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, a powerful and efficient speaking rate control method that can be seamlessly implemented in DNN-based speech synthesis models becomes necessary. A DNN-based speaking rate control method with multi-speaker WaveNet vocoder [30] had been initially provided and it outperformed the conventional TSM-based method and source-filter vocoder [31]. However, the inference speed of the method was quite slow due to the auto-regressive structure and the huge size of the WaveNet model [23].…”
Section: Introductionmentioning
confidence: 99%
“…The goal of the control is to synthesize speech as if the speaker has uttered the speech with the specified speaking rate. However, since past studies using existing corpora [31,32] always compared speaking-rate-controlled speech with original speech, we cannot state how far those control methods are from the goal.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the synthesis quality of these models is not high. To improve synthesis quality for SR conversion, a neural-network-based approach with the multi-speaker AR WaveNet vocoder [48], which can be realized with time-compressed or stretched acoustic features by sinc interpolation-based resampling [49], outperforms conventional signal-processing-based models [50]. However, the AR WaveNet vocoder, even using a GPU, cannot realize realtime synthesis.…”
mentioning
confidence: 99%