CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances

Guo, Jianmin; Nookala, Usha; Alwan, Abeer

doi:10.21437/interspeech.2017-430

Cited by 12 publications

(7 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speaker recognition was investigated in [16]. Gaussian mixtures were used as the main classification method for speaker recognition.…”

Section: Literature Review and Problem Statementmentioning

confidence: 99%

“…( ) 16) where N and F are the matrices composed of the zero-and first-order statistics, and is the covariance matrix of F. These i-vectors will have information of the language contained in the utterance they represent, since that is the task for which the T matrix has been trained. Voice recognition bottleneck features [35].…”

Section: Classifier System Deep Neural Networkmentioning

confidence: 99%

See 1 more Smart Citation

Development of security systems using DNN and i & x-vector classifiers

Mamyrbayev

Kydyrbekova

Alimhan

et al. 2021

EEJET

View full text Add to dashboard Cite

The widespread use of biometric systems entails increased interest from cybercriminals aimed at developing attacks to crack them. Thus, the development of biometric identification systems must be carried out taking into account protection against these attacks. The development of new methods and algorithms for identification based on the presentation of randomly generated key features from the biometric base of user standards will help to minimize the disadvantages of the above methods of biometric identification of users. We present an implementation of a security system based on voice identification as an access control key and a verification algorithm developed using MATLAB function blocks that can authenticate a person's identity by his or her voice. Our research has shown an accuracy of 90 % for this user identification system for individual voice characteristics. It has been experimentally proven that traditional MFCCs using DNN and i and x-vector classifiers can achieve good results. The paper considers and analyzes the most well-known approaches from the literature to the problem of user identification by voice: dynamic programming methods, vector quantization, mixtures of Gaussian processes, hidden Markov model. The developed software package for biometric identification of users by voice and the method of forming the user's voice standards implemented in the complex allows reducing the number of errors in identifying users of information systems by voice by an average of 1.5 times. Our proposed system better defines voice recognition in terms of accuracy, security and complexity. The application of the results obtained will improve the security of the identification process in information systems from various attacks.

show abstract

“…Speaker recognition was investigated in [16]. Gaussian mixtures were used as the main classification method for speaker recognition.…”

Section: Literature Review and Problem Statementmentioning

confidence: 99%

Section: Classifier System Deep Neural Networkmentioning

confidence: 99%

Development of security systems using DNN and i & x-vector classifiers

Mamyrbayev

Kydyrbekova

Alimhan

et al. 2021

EEJET

View full text Add to dashboard Cite

show abstract

“…The SSAE architecture can also be used to estimate the HB components directly from the regression layer. A similar CNN based architecture designed to regularise the mapping of short i-vectors to long i-vectors for a speaker diarization task is reported in [29]. The focus here is different, i.e., to regularise/supervise dimensionality reduction so that it preserves information critical to ABE.…”

Section: Application To Abementioning

confidence: 99%

Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders

2018

View full text Add to dashboard Cite

Artificial bandwidth extension (ABE) algorithms have been developed to improve quality when wideband devices receive speech signals from narrowband devices or infrastructure. The utilisation of contextual information in the form of dynamic features or explicit memory captured from neighbouring frames is common to ABE research, however the use of additional cues augments complexity and can introduce latency. Previous work shows that unsupervised, linear dimensionality reduction techniques help to reduce complexity. This paper reports a semisupervised, non-linear approach to dimensionality reduction using a stacked auto-encoder. In further contrast to previous work, it operates on raw spectra from which a low dimensional narrowband representation is learned in a data-driven manner. Three different objective speech quality measures show that the new features can be used with a standard regression model to improve ABE performance. Improvements in the mutual information between learned features and missing higher frequency components are also observed whereas improvements in speech quality are corroborated by informal listening tests.

show abstract

“…Since proposed in [1], i-vector has become the state-of-the-art speaker modeling technique, it is a simple but elegant factor analysis model, inspired by the Joint Factor Analysis (JFA) [2] framework. Though some researchers have been working on improving the i-vector model itself [3,4], more researchers pay attention to the compensation techniques in the i-vector space [5,6,7,8]. JFA can be regarded as a compensation method in the GMM super-vector space, which models the speaker and channel variabilities separately.…”

Section: Introductionmentioning

confidence: 99%

“…Authors in [7] proposed to use an auto-encoder to learn a projection which maps noisy i-vectors to de-noised ones. To address the short-duration problem of i-vector [20], a Convolutional Neural Network (CNN) based system was trained in [8] to map the i-vectors extracted from short utterances to the corresponding long-utterance i-vectors.…”

Section: Introductionmentioning

confidence: 99%

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition

Wang

Huang

et al. 2018

2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)

View full text Add to dashboard Cite

Linear Discriminant Analysis (LDA) has been used as a standard post-processing procedure in many state-of-the-art speaker recognition tasks. Through maximizing the inter-speaker difference and minimizing the intra-speaker variation, LDA projects i-vectors to a lower-dimensional and more discriminative subspace. In this paper, we propose a neural network based compensation scheme(termed as deep discriminant analysis, DDA) for i-vector based speaker recognition, which shares the spirit with LDA. Optimized against softmax loss and center loss at the same time, the proposed method learns a more compact and discriminative embedding space. Compared with the Gaussian distribution assumption of data and the learnt linear projection in LDA, the proposed method doesn't pose any assumptions on data and can learn a non-linear projection function. Experiments are carried out on a short-duration text-independent dataset based on the SRE Corpus, noticeable performance improvement can be observed against the normal LDA or PLDA methods.

show abstract

CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances

Cited by 12 publications

References 9 publications

Development of security systems using DNN and i & x-vector classifiers

Development of security systems using DNN and i & x-vector classifiers

Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition

Contact Info

Product

Resources

About