Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1430
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Prosody of RNN-Based English Text-To-Speech Synthesis by Incorporating a BERT Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
30
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(31 citation statements)
references
References 17 publications
1
30
0
Order By: Relevance
“…Similar method further verifies the ability of BERT to improve prosody on the Chinese multispeaker TTS task [15]. Along different lines, CHiVE-BERT [16] incorporates a BERT model in an RNN-based speech synthesis model. These approaches have improved the prosody of synthesized speech by exploiting the semantic information of the phrase and word from BERT.…”
Section: Introductionmentioning
confidence: 68%
“…Similar method further verifies the ability of BERT to improve prosody on the Chinese multispeaker TTS task [15]. Along different lines, CHiVE-BERT [16] incorporates a BERT model in an RNN-based speech synthesis model. These approaches have improved the prosody of synthesized speech by exploiting the semantic information of the phrase and word from BERT.…”
Section: Introductionmentioning
confidence: 68%
“…Recently, the large pretrained language model BERT [9] exhibits an impressive performance on many natural language processing (NLP) tasks, so it is also introduced to TTS [10][11][12]. Refs.…”
Section: Related Workmentioning
confidence: 99%
“…Ref. [12] tries to fine-tune the BERT parameters with a prosody prediction task but still freezes the word piece embeddings. All these works report that they have achieved some gains in naturalness.…”
Section: Related Workmentioning
confidence: 99%
“…Recently in English the field has also used linguistic features to improve prosody: using syllabic stress [21], semantic and syntactic features [22,23] and pre-trained language model embeddings [24,25]. Clockwork RNNs were also used to hierarchically encode linguistic features at varying levels in [26], a hierarchical encoder having previously helped in DNN-based [27,28].…”
Section: Linguistic Features In Tacotronmentioning
confidence: 99%