Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-755
|View full text |Cite
|
Sign up to set email alerts
|

AISHELL-3: A Multi-Speaker Mandarin TTS Corpus

et al.
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 75 publications
(11 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…The evaluation was conducted on a test set sampled from a multi-speaker Mandarin speech corpus called AISHELL-3 [24]. The test set contains 4,267 utterances from 44 speakers.…”
Section: Speaker Anonymization Experiments In Mandarinmentioning
confidence: 99%
See 1 more Smart Citation
“…The evaluation was conducted on a test set sampled from a multi-speaker Mandarin speech corpus called AISHELL-3 [24]. The test set contains 4,267 utterances from 44 speakers.…”
Section: Speaker Anonymization Experiments In Mandarinmentioning
confidence: 99%
“…More importantly, the current VPC 2020 primary baseline B1 requires large amounts of language-specific resources to train a language-dependent ASR AM, while the SSL-based soft content encoder of the proposed method learns universal representations by training with unlabeled audio data, which improves the portability to a new language. Extensive experiments were conducted on the VPC 2020 datasets in English and AISHELL-3 [24] datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.…”
Section: Introductionmentioning
confidence: 99%
“…[22] and Shi et al [191] introduced an identity feedback constraint by adding an additional loss term between the reference embedding and the extraction embedding of the synthesized signal, thus increasing the robustness and speaker similarity of the produced speeches.…”
Section: Multi-speaker Acoustic Modelmentioning
confidence: 99%
“…Thanks to the superiority of deep learning and large-scale of high-quality open source speech corpus [1] [2], computational generated speech has reached humanlike naturalness and hi-fidelity audio quality. With a small batch of recording audio samples, state-of-art synthesis systems can generate non-distinguishable speech with high similarity of the target speaker.…”
Section: Introductionmentioning
confidence: 99%
“…In order to fool the detection systems, we further add a post-processing modification on the generated audio, which cause a slight decay in audio quality but a significant promotion in spoofing. Audio samples are available at our demo page 1 The rest of this paper is organized as follows. Section 2 introduces our proposed method.…”
Section: Introductionmentioning
confidence: 99%