Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-384
|View full text |Cite
|
Sign up to set email alerts
|

Expressive, Variable, and Controllable Duration Modelling in TTS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Bidirectional encoder representations from transformers (BERT) [17], one of the well-known pre-trained language models currently, also shows potential for this task. For example, Futamata et al have introduced features from pre-trained BERT in Japanese phrase break prediction [16], and Abbas et al have taken word-level BERT embeddings as the input of a conventional phrasing model [18].…”
Section: Introductionmentioning
confidence: 99%
“…Bidirectional encoder representations from transformers (BERT) [17], one of the well-known pre-trained language models currently, also shows potential for this task. For example, Futamata et al have introduced features from pre-trained BERT in Japanese phrase break prediction [16], and Abbas et al have taken word-level BERT embeddings as the input of a conventional phrasing model [18].…”
Section: Introductionmentioning
confidence: 99%
“…Recent developments in TTS research allow for explicit control of specific speech features (e.g. duration [15], [16], duration and pitch [17], etc. ), thus providing the right tools to explicitly control acoustic features associated with emphasis in a voice-agnostic fashion, with no need for targeted recordings or annotations.…”
Section: Introductionmentioning
confidence: 99%