ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks

Lai, Cheng-I; Chen, Nanxin; Villalba, Jesús; Dehak, Najim

doi:10.21437/interspeech.2019-1794

Cited by 91 publications

(49 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, attentionbased models have been studied in [61,62] during the ASVspoof 2019 challenge. It is also worth noting that the best performing models on the ASVspoof challanges used fusion approaches, either at the classifier output or the feature level [57,76,15], indicating the challenges in designing a single countermeasure capable of capturing all the variabilities that may appear in wild test conditions in a presentation attack. Please refer to Table 1 for details.…”

Section: Related Workmentioning

confidence: 99%

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Chettri

Kinnunen

Benetos

2020

Computer Speech & Language

View full text Add to dashboard Cite

Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount -yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a human-spoof discriminator. In this paper, we advocate a different, deep generative approach that leverages from powerful unsupervised manifold learning in classification. The potential benefits include the possibility to sample new data, and to obtain insights to the latent features of genuine and spoofed speech. To this end, we propose to use variational autoencoders (VAEs) as an alternative backend for replay attack detection, via three alternative models that differ in their class-conditioning. The first one, similar to the use of Gaussian mixture models (GMMs) in spoof detection, is to train independently two VAEs -one for each class. The second one is to train a single conditional model (C-VAE) by injecting a one-hot class label vector to the encoder and decoder networks. Our final proposal integrates an auxiliary classifier to guide the learning of the latent space. Our experimental results using constant-Q cepstral coefficient (CQCC) features on the ASVspoof 2017 and 2019 physical access subtask datasets indicate that the C-VAE offers substantial improvement in comparison to training two separate VAEs for each class. On the 2019 dataset, the C-VAE outperforms the VAE and the baseline GMM by an absolute 9 -10% in both equal error rate (EER) and tandem detection cost function (t-DCF) metrics. Finally, we propose VAE residuals -the absolute difference of the original input and the reconstruction as features for spoofing detection. The proposed frontend approach augmented with a convolutional neural network classifier demonstrated substantial improvement over the VAE backend use case.

show abstract

Section: Related Workmentioning

confidence: 99%

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Chettri

Kinnunen

Benetos

2020

Computer Speech & Language

View full text Add to dashboard Cite

show abstract

“…System #1 refers to the proposed architecture that jointly optimizes SID, PAD, and ISV loss (see Figure 1a). System #2-SE is the result of applying squeeze-excitation (SE) [26] based on its recent application to PAD [9]. System #3 describes the result of assigning three max feature map (MFM) blocks [18] for SID as well as for PAD after the first three MFM blocks.…”

Section: Experimental Configurationsmentioning

confidence: 99%

“…However, SV systems are known to be vulnerable to various presentation attacks, such as replay attacks, voice conversion, and speech synthesis. These vulnerabilities have inspired research into presentation attack detection (PAD), which classifies given utterances as spoofed or not spoofed [6][7][8], where many DNN-based systems have achieved promising results [9][10][11]. Table 1 demonstrates the vulnerability of conventional SV systems when faced with presentation attacks.…”

Section: Introductionmentioning

confidence: 99%

Integrated Replay Spoofing-Aware Text-Independent Speaker Verification

et al. 2020

View full text Add to dashboard Cite

A number of studies have successfully developed speaker verification or presentation attack detection systems. However, studies integrating the two tasks remain in the preliminary stages. In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach. The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning using a common feature. However, through experiments, we hypothesize that the information required for performing speaker verification and presentation attack detection might differ because speaker verification systems try to remove device-specific information from speaker embeddings, while presentation attack detection systems exploit such information. Therefore, we propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection. This approach has thee input components: two speaker embeddings (for enrollment and test each) and prediction of presentation attacks. Experiments are conducted using the ASVspoof 2017-v2 dataset, which includes official trials on the integration of speaker verification and presentation attack detection. The proposed back-end approach demonstrates a relative improvement of 21.77% in terms of the equal error rate for integrated trials compared to a conventional speaker verification system.

show abstract

“…Experiments were carried out using the following three CNN variants: (1) ResNet18 [22,24]; (2) SENet50 (Squeeze-Excitation Network) [24]; and (3) Light CNN (LCNN) [16]. The model parameters and architectures of ResNet18 and SENet50 are shown in Table 1.…”

Section: Experimental Settingsmentioning

confidence: 99%

Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV

Wang¹,

Lee²,

Koshinaka³

2020

The Speaker and Language Recognition Workshop (Odyssey 2020)

View full text Add to dashboard Cite

This paper presents a simple but effective method that uses multi-resolution feature maps with convolutional neural networks (CNNs) for anti-spoofing in automatic speaker verification (ASV). The central idea is to alleviate the problem that the feature maps commonly used in anti-spoofing networks are insufficient for building discriminative representations of audio segments, as they are often extracted by a single-length sliding window. Resulting trade-offs between time and frequency resolutions restrict the information in single spectrograms. The proposed method improves both frequency resolution and time resolution by stacking multiple spectrograms that are extracted using different window lengths. These are fed into a convolutional neural network in the form of multiple channels, making it possible to extract more information from input signals while only marginally increasing computational costs. The efficiency of the proposed method has been conformed on the ASVspoof 2019 database. We show that the use of the proposed multiresolution inputs consistently outperforms that of score fusion across different CNN architectures. Moreover, computational cost remains small.

show abstract

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks

Cited by 91 publications

References 31 publications

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Integrated Replay Spoofing-Aware Text-Independent Speaker Verification

Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV

Contact Info

Product

Resources

About