The Thinkit System for Icassp2021 M2voc Challenge

Shang, Zengqiang; Zhang, Haozhe; Chen, Ziyi; Zhou, Bolin; Zhang, Pengyuan

doi:10.1109/icassp39728.2021.9413669

Cited by 4 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…T18 proposed to use a BERT [41] module to predict the break of each Chinese character in an input sentence. T15 [42] used a fine-grained encoder added at the decoder's tailor, which extracts variable-length detailed style information from multiple reference samples via an attention mechanism. T03 and T15 also used global style tokens (GST) for both speaker and style control, which consists of a reference encoder, style attention, and style embedding.…”

Section: Speaker and Style Modelingmentioning

confidence: 99%

The Multi-speaker Multi-style Voice Cloning Challenge 2021

Xie

Tian

Liu

et al. 2021

Preprint

View full text Add to dashboard Cite

The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively. There are also two sub-tracks in each track. For sub-track a, to fairly compare different strategies, the participants are allowed to use only the training data provided by the organizer strictly. For sub-track b, the participants are allowed to use any data publicly available. In this paper, we present a detailed explanation on the tasks and data used in the challenge, followed by a summary of submitted systems and evaluation results.

show abstract

Section: Speaker and Style Modelingmentioning

confidence: 99%