2021
DOI: 10.48550/arxiv.2110.02635
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalization Ability of MOS Prediction Networks

Abstract: Automatic methods to predict listener opinions of synthesized speech remain elusive since listeners, systems being evaluated, characteristics of the speech, and even the instructions given and the rating scale all vary from test to test. While automatic predictors for metrics such as mean opinion score (MOS) can achieve high prediction accuracy on samples from the same test, they typically fail to generalize well to new listening test contexts. In this paper, using a variety of networks for MOS prediction incl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
33
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(35 citation statements)
references
References 16 publications
(23 reference statements)
2
33
0
Order By: Relevance
“…Cooper et al [4] show that adding silence and changing speed as data augmentations can improve MOSNet while seems not helpful for SSL-based MOS prediction models. Although authors do not explain the motivation for choosing these augmentations, we consider that these augmentations do not influence MOS.…”
Section: Data Augmentationmentioning
confidence: 99%
See 4 more Smart Citations
“…Cooper et al [4] show that adding silence and changing speed as data augmentations can improve MOSNet while seems not helpful for SSL-based MOS prediction models. Although authors do not explain the motivation for choosing these augmentations, we consider that these augmentations do not influence MOS.…”
Section: Data Augmentationmentioning
confidence: 99%
“…In this paper, two datasets are involved in the experiments: BVCC [4] and BC2019 [17]. BVCC is a newly collected MOS dataset that contains 7106 English samples from previous Blizzard Challenge for TTS [18,19,20,21,22] and Voice Conversion Challenge [23,24,25,26,27] as well as synthesized samples from systems implemented in ESPNet [28].…”
Section: Experiments Setupmentioning
confidence: 99%
See 3 more Smart Citations