2018
DOI: 10.1109/taslp.2018.2791105
|View full text |Cite
|
Sign up to set email alerts
|

DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification

Abstract: This paper proposes and investigates several deep neural network (DNN)-based score compensation, transformation and calibration algorithms for enhancing the noise robustness of i-vector speaker verification systems. Unlike conventional calibration methods where the required score shift is a linear function of SNR or log-duration, the DNN approach learns the complex relationship between the score shifts and the combination of i-vector pairs and uncalibrated scores. Furthermore, with the flexibility of DNNs, it … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 37 publications
0
3
0
Order By: Relevance
“…In [25], we proposed to estimate the score shifts by multitask DNNs using noisy i-vector pairs and their corresponding PLDA scores as input. Moreover, instead of expressing the score shifts as a linear function of SNRs, we used the SNRs of training utterances as part of the target outputs and applied multi-task learning to guide the network to produce the ideal score shifts or clean scores.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In [25], we proposed to estimate the score shifts by multitask DNNs using noisy i-vector pairs and their corresponding PLDA scores as input. Moreover, instead of expressing the score shifts as a linear function of SNRs, we used the SNRs of training utterances as part of the target outputs and applied multi-task learning to guide the network to produce the ideal score shifts or clean scores.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, instead of expressing the score shifts as a linear function of SNRs, we used the SNRs of training utterances as part of the target outputs and applied multi-task learning to guide the network to produce the ideal score shifts or clean scores. In this paper, we extend the multitask DNNs in [25] in three respects. First, in addition to using SNR as target outputs, we also use utterance duration and same-speaker and different-speaker hypotheses as target outputs.…”
Section: Introductionmentioning
confidence: 99%
“…It is suitable to choose WT method when the SNR is high, while CNN method performs well when the SNR is low. Besides, a certain amount of noise in SERS spectra can enhance the robustness and generalization of arithmetic [186]. Therefore, some noise of original SERS signal does not need to be removed in the process of qualitative training model.…”
Section: Spectral Acquisition and Processingmentioning
confidence: 99%