A Principle Solution for Enroll-Test Mismatch in Speaker Recognition

Li, Lantian; Wang, Dong; Kang, Jiawen; Wang, Renyu; Wu, Jing; Gao, Zhendong; Xiao, Chao

doi:10.1109/taslp.2022.3140558

Cited by 5 publications

(4 citation statements)

References 67 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This analytical view provides a powerful tool by which we can analyze how the performance reduction is caused by a particular imperfection, and design suitable algorithms to compensate for the impact. Recently, using this tool we provided a theoretically optimal solution for the enroll-test mismatch problem and achieved a big success [17,19].…”

Section: Discussionmentioning

confidence: 99%

A simulation study on optimal scores for speaker recognition

Wang

2020

J AUDIO SPEECH MUSIC PROC.

Self Cite

View full text Add to dashboard Cite

In this article, we conduct a comprehensive simulation study for the optimal scores of speaker recognition systems that are based on speaker embedding. For that purpose, we first revisit the optimal scores for the speaker identification (SI) task and the speaker verification (SV) task in the sense of minimum Bayes risk (MBR) and show that the optimal scores for the two tasks can be formulated as a single form of normalized likelihood (NL). We show that when the underlying model is linear Gaussian, the NL score is mathematically equivalent to the PLDA likelihood ratio (LR), and the empirical scores based on cosine distance and Euclidean distance can be seen as approximations of this linear Gaussian NL score under some conditions.Based on the unified NL score, we conducted a comprehensive simulation study to investigate the behavior of the scoring component on both the SI task and SV task, in the case where the distribution of the speaker vectors perfectly matches the assumption of the NL model, as well as the case where some mismatch is involved. Importantly, our simulation is based on the statistics of speaker vectors derived from a practical speaker recognition system, hence reflecting the behavior of the NL scoring in real-life scenarios that are full of imperfection, including non-Gaussianality, non-homogeneity, and domain/condition mismatch.

show abstract

Section: Discussionmentioning

confidence: 99%

A simulation study on optimal scores for speaker recognition

Wang

2020

J AUDIO SPEECH MUSIC PROC.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The statistics change and the mean shift cause more severe problems in the cross-genre scenario, as the enroll data and test data in this scenario possess different statistical properties but they have to be represented in a single PLDA model. We presented a deep analysis on this enroll-test mismatch problem in our recent study [76], but mismatch caused by the cross-genres challenge is yet to be thoroughly studied.…”

Section: Discussionmentioning

confidence: 99%

CN-Celeb: multi-genre speaker recognition

Li¹,

Rui-qi²,

Kang³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Research on speaker recognition is extending to address the vulnerability in the wild conditions, among which genre mismatch is perhaps the most challenging, for instance, enrollment with reading speech while testing with conversational or singing audio. This mismatch leads to complex and composite inter-session variations, both intrinsic (i.e., speaking style, physiological status) and extrinsic (i.e., recording device, background noise). Unfortunately, the few existing multi-genre corpora are not only limited in size but are also recorded under controlled conditions, which cannot support conclusive research on the multi-genre problem. In this work, we firstly publish CN-Celeb, a large-scale multi-genre corpus that includes in-the-wild speech utterances of 3,000 speakers in 11 different genres. Secondly, using this dataset, we conduct a comprehensive study on the multi-genre phenomenon, in particular the impact of the multi-genre challenge on speaker recognition, and on how to utilize the valuable multi-genre data more efficiently.

show abstract

“…Based on various tasks, SR is classified as speaker identification (SI) and speaker verification (SV). In order to identify an unknown speaker, SI [14][15][16][17][18][19][20][21][22][23] analyses their verbal output. From the set of all N registered speakers, it picks the right one, as shown in Fig.…”

Section: E Speaker Identification and Speaker Verificationmentioning

confidence: 99%

Analysis of Human Voice for Speaker Recognition: Concepts and Advancement

Sumit Srivastava

2024

jes

View full text Add to dashboard Cite

Human voice or speech is a contactless, non-invasive biometric trait for human recognition, easy to use with minimal computer complexity and inexpensive to implement. Speaker recognition (SR) has turned out to be a magnificent approach using speech as the central premise since decades. Its broad range of usages, like forensic speech verification to identify culprits by law enforcement authorities and access control to mobile banking, mobile shopping, etc., has made it a lucrative area of research. Also, the ease of use and dependability of SR will significantly assist people with disabilities in securely accessing and reaping the benefits of digital-era services. Additionally, the emergence of numerous deep learning methods for feature extraction and classification, has helped SR to achieve tremendous progress. This paper presents a comprehensive study on the progression of SR for decades till the present, including integration with Blockchain and challenges. It covers most of the factors that influence SR performance such as fundamentals and structure of SR, different speech pre-processing techniques, various speech features, feature extraction techniques, traditional and neural network-based classification techniques and deep learning-based SR toolkits. As a consequence, in this digital Blockchain era, it will help to design robust and reliable recognition-based services for mankind.

show abstract

A Principle Solution for Enroll-Test Mismatch in Speaker Recognition

Cited by 5 publications

References 67 publications

A simulation study on optimal scores for speaker recognition

A simulation study on optimal scores for speaker recognition

CN-Celeb: multi-genre speaker recognition

Analysis of Human Voice for Speaker Recognition: Concepts and Advancement

Contact Info

Product

Resources

About