ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683855
|View full text |Cite
|
Sign up to set email alerts
|

SDR – Half-baked or Well Done?

Abstract: In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous objective measure of denoising/separation quality. A decade ago, the BSS eval toolkit was developed to give researchers worldwide a way to evaluate the quality of their algorithms in a simple, fair, and hopefully insightful way: it attempted to account for channel variations, and to not only evaluate the total distortion in the estimated signal but also split it in terms of various factors such as remaining interference, newly a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
484
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 752 publications
(525 citation statements)
references
References 29 publications
1
484
0
1
Order By: Relevance
“…Following the common speech separation metrics [12,21], we adopt average SI-SDR and SDR improvement over mixture as the evaluation metrics. We also report the performances under different ranges of angle difference between speakers to give a more comprehensive assessment for the model.…”
Section: Results Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Following the common speech separation metrics [12,21], we adopt average SI-SDR and SDR improvement over mixture as the evaluation metrics. We also report the performances under different ranges of angle difference between speakers to give a more comprehensive assessment for the model.…”
Section: Results Analysismentioning
confidence: 99%
“…Finally, the decoder reconstructs the separated speech waveform from the masked mixture encode for each speaker. To optimize the network end-to-end, scale-invariant signal-to-distortion ratio (SI-SDR) [12] is utilized as the training objective:…”
Section: Multi-channel Speech Separationmentioning
confidence: 99%
“…Time-domain loss (TDL): For time-domain networks, we employ the classic signal-to-noise ratio (SNR) [23] as time-domain loss,…”
Section: Training Lossesmentioning
confidence: 99%
“…where θ are the model parameters, SNR = −10 log 10 ( ||x|| 2 ||x−x|| 2 ) is the SNR between the clean speech and the enhanced speech, and || · || 2 is the L 2 norm. We decided to use the classic SNR loss [23] instead of the scale-invariant SNR (SiSNR) used in the original Tas-Net [17], because training the network with SiSNR let the network freely change the level of the enhanced signal. With the SNR loss, the scale of the signals is preserved avoiding any scaling requirement when passing the signal to ASR.…”
Section: Training Lossesmentioning
confidence: 99%
“…The three composite measures CSIG, CBAK, and COVL are the popular predictor of the mean opinion score (MOS) of the target signal distortion, background noise interference, and overall speech quality, respectively [30]. In addition, as the standard metric in speech enhancement, we also evaluated scale-invariant SDR (SI-SDR) [31]. Table 1 shows the experimental results.…”
Section: Openmentioning
confidence: 99%