ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413711
|View full text |Cite
|
Sign up to set email alerts
|

CDPAM: Contrastive Learning for Perceptual Audio Similarity

Abstract: Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbations on which it was trained. This paper introduces CDPAM -a metric that builds on and advances DPAM. The primary … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 26 publications
(14 citation statements)
references
References 31 publications
0
14
0
Order By: Relevance
“…On the other hand, one may consider adapting existing objective assessment metrics for quality of monaural signals such as PESQ [6], POLQA [7], DPAM [8] and CDPAM [9] for this task. However, since these metrics only focus on perceived quality rather than spatialization, their utility for multi-channel signals remains limited [1,10].…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, one may consider adapting existing objective assessment metrics for quality of monaural signals such as PESQ [6], POLQA [7], DPAM [8] and CDPAM [9] for this task. However, since these metrics only focus on perceived quality rather than spatialization, their utility for multi-channel signals remains limited [1,10].…”
Section: Introductionmentioning
confidence: 99%
“…We also compare the subjective listening test scores with widely used audio quality metrics and suggest that, similar to speech enhancement, these metrics correlate poorly with human perception [1,5]. With this work, we hope to motivate both future research in music enhancement as well as music quality perceptual metrics akin to those in the speech literature [6,7]. To promote further research, audio samples generated in our experiments and source code are provided at our project website 1 .…”
Section: Introductionmentioning
confidence: 95%
“…Contrastive learning is a self-supervised machine-learning method that can utilize unlabeled data by learning from intrinsic similarity relations between data. Contrastive learning is widely used in speech quality assessment, in which speech representations are learned from large-scale unlabeled speech data [9][10][11]. Given scores s1 and s2 of utterances x1 and x2, the difference in the scores (dx 1 ,x 2 = s1 − s2) can be regarded as the difference in the two utterances in terms of speech quality.…”
Section: Contrastive Lossmentioning
confidence: 99%