Proceedings of the 3rd ACM India Joint International Conference on Data Science &Amp; Management of Data (8th ACM IKDD CODS &Am 2021
DOI: 10.1145/3430984.3431034
|View full text |Cite
|
Sign up to set email alerts
|

Data-Efficient Training Strategies for Neural TTS Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…In the context of multilingual E2E training for Indian languages, [54] trains convolutional attention-based TTS with language, speaker and gender embeddings. In [56], pretraining strategies are explored between source and target languages, which enable the training of multilingual voices with a reduced amount of data. In [58], byte inputs are mapped to spectrograms and experiments are performed with 40+ languages, including Hindi, Tamil and Telugu.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In the context of multilingual E2E training for Indian languages, [54] trains convolutional attention-based TTS with language, speaker and gender embeddings. In [56], pretraining strategies are explored between source and target languages, which enable the training of multilingual voices with a reduced amount of data. In [58], byte inputs are mapped to spectrograms and experiments are performed with 40+ languages, including Hindi, Tamil and Telugu.…”
Section: Related Workmentioning
confidence: 99%
“…Going ahead, the training data per language can be further reduced to assess extreme data-stressed situations. To improve the synthesis quality of seen languages, generic voices can be further fine-tuned on seen languages, as explored in [28], [56]. Additional embeddings, such as language embeddings, can be included during training.…”
Section: A Analysis Of Phonotactics Across Languagesmentioning
confidence: 99%
“…Self-Supervised Training [51,368,433,78,140,346,197,352,71] Cross-Lingual Transfer LRSpeech [390], [42,12,60,271,105] Cross-Speaker Transfer [216,125,59,39] Speech Chain/ Back Transformation SpeechChain [344,345], LRSpeech [390,285] Dataset Mining in the Wild [58,119,57] Robust Enhancing Attention Tacotron 2 [376], DCTTS [326], SMA [104] MultiSpeech [38], [309,297,431,326,264,262] Replacing Attention with Duration…”
Section: Lightweight Modelmentioning
confidence: 99%
“…This is mainly due to the lack of child voice datasets and difficulty in creating such datasets. As TTS models require hundreds of hours of annotated data for training [2], performing TTS for child voices can be quite challenging. The focus of this work is to explore the potential of state-of-the-art (SOTA) TTS to build a pipeline for the synthesis of children's voices with low data requirements.…”
Section: Introductionmentioning
confidence: 99%