2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX) 2020
DOI: 10.1109/qomex48832.2020.9123150
|View full text |Cite
|
Sign up to set email alerts
|

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

Abstract: Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of both design and usage. As an open source C++ library or binary with permissive licensing, ViSQOL can now be deployed beyond the research context into production usage. The feedback from internal production teams at Google has helped to improve this new release, and serves to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
38
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 68 publications
(38 citation statements)
references
References 17 publications
0
38
0
Order By: Relevance
“…These are compared via an adaptation of the structural similarity index, originally developed for evaluating the quality of compressed images and then adapted to predict intelligibility [47]. Version 3 was recently released [48], [49] and it is here referred to as ViSQOLAudioV3. The declared aim for this new version is to "fill the blind spots in the training/validation datasets" so as to have a more general system that would perform better "in the wild".…”
Section: G Visqolaudiomentioning
confidence: 99%
“…These are compared via an adaptation of the structural similarity index, originally developed for evaluating the quality of compressed images and then adapted to predict intelligibility [47]. Version 3 was recently released [48], [49] and it is here referred to as ViSQOLAudioV3. The declared aim for this new version is to "fill the blind spots in the training/validation datasets" so as to have a more general system that would perform better "in the wild".…”
Section: G Visqolaudiomentioning
confidence: 99%
“…For the purposes of this study, this 'language' feature will be a laboratory identifier where the native language is used to test, and also encompasses other factors in the entire test environment such as the culture of the laboratory, the listening equipment, and so on. Each rater and language identifier is used as an index variable with normal priors that linearly influence the φ offset for the ordered logit model, and an exponential model for NSIM, as was found to be useful in [7]. The prior for φ i can be described for individual observation i, rater j, and language k, and NSIM observation x i as…”
Section: E Features and Parametersmentioning
confidence: 99%
“…Different kinds of objective models exist depending on the speech applications and services. Models such as POLQA [2], PESQ [3], and ViSQOL [4,5] have been shown to work well for a wide variety of coding, channel and environmental degradations to the speech signal. They are full-reference (FR) metrics that compare a clean reference to a test signal that has been degraded.…”
Section: Introductionmentioning
confidence: 99%
“…They pre-align the signals in order to account for quality issues resulting from delay and signal corruption. For example, the ViSQOL metric [4,5] uses the neurogram similarity index measure (NSIM) to estimate the similarity between a pre-aligned reference patch and a degraded spectrogram patch frame by frame.…”
Section: Introductionmentioning
confidence: 99%