System combination for short utterance speaker recognition

Li, Lantian; Wang, Dong; Zhang, Xiaodong; Zheng, Thomas Fang; Jin, Panshi

doi:10.1109/apsipa.2016.7820903

Cited by 7 publications

(5 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DNN-based i-vector system significantly exceeds its relative baseline. This confirms the effectiveness of recognition methods [33]. In addition, it can be seen that the GMM-UBM baseline is superior to the two i-vector systems, but after using probabilistic linear discriminant analysis (PLDA) [34], the i-vector system is improved and outperforms the GMM-UBM system.…”

Section: Classifier System Deep Neural Networksupporting

confidence: 63%

Development of security systems using DNN and i & x-vector classifiers

Mamyrbayev

Kydyrbekova

Alimhan

et al. 2021

EEJET

View full text Add to dashboard Cite

The widespread use of biometric systems entails increased interest from cybercriminals aimed at developing attacks to crack them. Thus, the development of biometric identification systems must be carried out taking into account protection against these attacks. The development of new methods and algorithms for identification based on the presentation of randomly generated key features from the biometric base of user standards will help to minimize the disadvantages of the above methods of biometric identification of users. We present an implementation of a security system based on voice identification as an access control key and a verification algorithm developed using MATLAB function blocks that can authenticate a person's identity by his or her voice. Our research has shown an accuracy of 90 % for this user identification system for individual voice characteristics. It has been experimentally proven that traditional MFCCs using DNN and i and x-vector classifiers can achieve good results. The paper considers and analyzes the most well-known approaches from the literature to the problem of user identification by voice: dynamic programming methods, vector quantization, mixtures of Gaussian processes, hidden Markov model. The developed software package for biometric identification of users by voice and the method of forming the user's voice standards implemented in the complex allows reducing the number of errors in identifying users of information systems by voice by an average of 1.5 times. Our proposed system better defines voice recognition in terms of accuracy, security and complexity. The application of the results obtained will improve the security of the identification process in information systems from various attacks.

show abstract

Section: Classifier System Deep Neural Networksupporting

confidence: 63%

Development of security systems using DNN and i & x-vector classifiers

Mamyrbayev

Kydyrbekova

Alimhan

et al. 2021

EEJET

View full text Add to dashboard Cite

show abstract

“…However, there are many cases where multiple samples are not available for comparison (only a single voice recording fragment is available). Several studies have been conducted using short utterances [17–20], but these generally do not meet the requirements of forensic evaluation: the evaluation datasets do not follow a strict protocol [21] or use techniques that have already been outperformed by deep learning techniques in regular speaker recognition. This study aims to investigate this scenario in two ways: only one sample is available for the unknown speaker and (i) only one or (ii) multiple samples are available for the known speaker.…”

Section: Introductionmentioning

confidence: 99%

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

Schiferl

Fejes

2023

Journal of Forensic Sciences

View full text Add to dashboard Cite

In forensic voice comparison, deep learning has become widely popular recently. It is mainly used to learn speaker representations, called embeddings or embedding vectors. Speaker embeddings are often trained using corpora mostly containing widely spoken languages. Thus, language dependency is an important factor in automatic forensic voice comparison, especially when the target language is linguistically very different from that the model is trained on. In the case of a low‐resource language, developing a corpus for forensic purposes containing enough speakers to train deep learning models is costly. This study aims to investigate whether a model pre‐trained on multilingual (mostly English) corpus can be used on a target low‐resource language (here, Hungarian), not represented by the model. Often multiple samples are not available from the offender (unknown speaker). Samples are therefore compared pairwise with and without speaker enrollment for suspect (known) speakers. Two corpora are used that were developed especially for forensic purposes and a third that is meant for traditional speaker verification. Speaker embedding vectors are extracted by the x‐vector and ECAPA‐TDNN techniques. Speaker verification was evaluated in the likelihood‐ratio framework. A comparison is made between the language combinations (modeling, LR calibration, and evaluation). The results were evaluated by Cllrmin and EER metrics. It was found that the model pre‐trained on a different language but on a corpus with a significant number of speakers can be used on samples with language mismatch. Sample duration and speaking style also seem to affect the performance.

show abstract

“…Studies have shown that in very short‐duration cases, the classical GMM‐UBM‐based approach worked better with respect to the modern i ‐vector‐based approach [40]. Fusion of multiple classifiers yielded considerable improvements over the standalone approaches [104]. Research in short‐utterance problem in ASV has seen efforts to accommodate phonetic distribution for speaker modelling [57, 97].…”

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Speaker verification with short utterances: a review of challenges, trends and opportunities

2017

View full text Add to dashboard Cite

Automatic speaker verification (ASV) technology now reports a reasonable level of accuracy in its applications in voice-based biometric systems. However, it requires adequate amount of speech data for enrolment and verification; otherwise, the performance becomes considerably degraded. For this reason, the trade-off between the convenience and security is difficult to maintain in practical scenarios. The utterance duration remains a critical issue while deploying a voice biometric system in real-world applications. A large amount of research work has been carried out to address the limited data issue within the scope of SV. The advancements and research activities in mitigating the challenges due to short utterance have seen a significant rise in recent times. In this study, the authors present an extensive survey of SV with short utterances considering the studies from recent past and include latest research offering various solutions and analyses. The review also summarises the major findings of the studies of duration variability problem in ASV systems. Finally, they discuss a number of possible future directions promoting further research in this field. 2 Brief overview of ASV An ASV system includes three fundamental modules [1, 2]: a feature extraction unit, which transforms the speech signal in a compact form, a statistical modelling unit to characterise the extracted features, and finally a classification module to classify a test speech. 2.1 Feature extraction approaches The state-of-the-art ASV systems use three major types of feature extraction techniques: sub-segmental, segmental and suprasegmental analyses. Speech signals analysed using the frame size

show abstract

System combination for short utterance speaker recognition

Cited by 7 publications

References 13 publications

Development of security systems using DNN and i & x-vector classifiers

Development of security systems using DNN and i & x-vector classifiers

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

Speaker verification with short utterances: a review of challenges, trends and opportunities

Contact Info

Product

Resources

About