This paper presents the Speech Technology Center (STC) systems submitted to Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof) Challenge 2015. In this work we investigate different acoustic feature spaces to determine reliable and robust countermeasures against spoofing attacks. In addition to the commonly used front-end MFCC features we explored features derived from phase spectrum and features based on applying the multiresolution wavelet transform. Similar to state-of-the-art ASV systems, we used the standard TV-JFA approach for probability modelling in spoofing detection systems. Experiments performed on the development and evaluation datasets of the Challenge demonstrate that the use of phase-related and wavelet-based features provides a substantial input into the efficiency of the resulting STC systems. In our research we also focused on the comparison of the linear (SVM) and nonlinear (DBN) classifiers. .
Abstract. In this paper we consider different approaches of artificial neural networks application for speaker recognition task. We investigated the performance of DNN application at different levels of speaker recognition system: i-vector extraction level and model Back-End level. Results of our study perform high efficiency of the proposed neural network based approaches for solving this problem. It is shown that the use of DNN technology at different levels increases the reliability of speaker recognition system independently. However, there are some disadvantages of such systems, which are also described in this paper.
This paper presents an ITMO university system submitted to the Speakers in the Wild (SITW) Speaker Recognition Challenge. During evaluation track of the SITW challenge we explored conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector systems and recently developed DNN-posteriors based i-vector systems. The systems were investigated under the real-world media channel conditions represented in the challenge. This paper discusses practical issues of the robust i-vector systems training and performs investigation of denoising autoencoder (DAE) based back-end when applied to "in the wild" conditions. Our speaker diarization approach for "multi-speaker in the file" conditions is also briefly presented in the paper. Experiments performed on the evaluation dataset demonstrate that DNN-based i-vector systems are superior to the UBM-GMM based systems and applying DAE-based back-end helps to improve system performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.