ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683132
|View full text |Cite
|
Sign up to set email alerts
|

An Improved Uncertainty Propagation Method for Robust I-vector Based Speaker Recognition

Abstract: The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition. They have achieved promising results on small datasets such as YOHO and Wall Street Journal, but litt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…Recently, deep neural network (DNN) becomes used in speech [51] and speaker recognition [52], [53], where speech recognition aims at determining the underlying text or command of the speech signal. However, the major breakthroughs made by DNN-based methods reside in speech recognition; for speaker recognition, ivector-PLDA based methods still exhibit the state-of-the-art performance [6]. Moreover, DNNbased methods usually rely on a much larger amount of labeled training dataset, which could greatly increase the computational complexity of training compared with ivector-PLDA and GMM-UBM based methods [54], thus are not suitable for off-line speaker enrollment on client-side devices.…”
Section: A Speaker Recognition System (Srs)mentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, deep neural network (DNN) becomes used in speech [51] and speaker recognition [52], [53], where speech recognition aims at determining the underlying text or command of the speech signal. However, the major breakthroughs made by DNN-based methods reside in speech recognition; for speaker recognition, ivector-PLDA based methods still exhibit the state-of-the-art performance [6]. Moreover, DNNbased methods usually rely on a much larger amount of labeled training dataset, which could greatly increase the computational complexity of training compared with ivector-PLDA and GMM-UBM based methods [54], thus are not suitable for off-line speaker enrollment on client-side devices.…”
Section: A Speaker Recognition System (Srs)mentioning
confidence: 99%
“…Speaker recognition systems (SRSs) are ubiquitous in our daily life, ranging from biometric authentication [2], [3], forensic tests [4], to personalized service on smart devices [5]. Machine learning techniques are the mainstream method for implementing SRSs [6], however, they are vulnerable to adversarial attacks (e.g., [7], [8], [9]). Hence, it is vital to understand the security implications of SRSs under adversarial attacks.…”
Section: Introductionmentioning
confidence: 99%
“…Many researchers tried to achieve robustness within the i-vector framework, or in the low dimensional i-vector space. In (Ribas and Vincent 2019), uncertainty propagation was employed in both UBM and factor analysis model, and a slight improvement over a speech enhancement algorithm was reported. Clean i-vectors were MAP estimated (called i-MAP) given the noisy i-vectors in (Ben Kheder et al 2015, Ben Kheder et al 2014, assuming the distributions are normal, and noise is additive in the i-vector space.…”
Section: Introductionmentioning
confidence: 99%
“…Unfortunately, VPSs are vulnerable to different adversarial attacks [6][7][8] and contain various other performance degrading factors/variabilities such as channel mismatch (using different channels for enrollment and test data sets) [9], room or space reverberation (decay in sound intensity with time) [10], background noise, and speaker's internal variations such as language, emotions, health, and vocal efforts. [11][12][13][14].…”
Section: Introductionmentioning
confidence: 99%