Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10022
|View full text |Cite
|
Sign up to set email alerts
|

A Transfer and Multi-Task Learning based Approach for MOS Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…• The system from ByteDance AI-LAB (T20) [140] ranked 4th in terms of both system-and utterance-level SRCC. It was based on LDNet, and they combined the main and OOD track datasets with a shared encoder and separate decoders.…”
Section: Team Approachesmentioning
confidence: 99%
“…• The system from ByteDance AI-LAB (T20) [140] ranked 4th in terms of both system-and utterance-level SRCC. It was based on LDNet, and they combined the main and OOD track datasets with a shared encoder and separate decoders.…”
Section: Team Approachesmentioning
confidence: 99%
“…Their superiority over other models was highlighted in the VoiceMOS Challenge 2022 [14], a shared task using common datasets for MOS prediction, where winning teams extended the SSL-MOS baseline to outperform it only by a margin on the third decimal point of the correlation metrics. Some interesting proposed additions to the baseline include ensembling [15,16], multi-task learning [17], and use of speech recognizers to recreate the phoneme sequence [15] or to get ASR evaluations [16]. As the training dataset included VC and TTS systems spanning over a decade [18], it is unclear if the trained models are able to distinguish between similar systems and utterances, which is a realistic evaluation scenario for TTS researchers.…”
Section: Related Workmentioning
confidence: 99%
“…BVCC is a collection of MOS ratings from its own large-scale listening test on samples obtained from 6 years of the Blizzard Challenge (BC) and 3 years of the Voice Conversion Challenge (VCC). BVCC was used as baseline training data in the challenge, and greatly enabled following research, e.g., [11,12,13].…”
Section: Automatic Prediction Of Mosmentioning
confidence: 99%