Classifiers for synthetic speech detection: a comparison

Hanilçi, Cemal; Kinnunen, Tomi; Sahidullah, Md; Сизов, А. С.

doi:10.21437/interspeech.2015-466

Cited by 29 publications

(18 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, recent studies have shown that a well-trained ASV system could be deceived by malicious attacks [1][2][3]. In the last decade, the speaker verification community held several ASVspoof challenge competitions [4][5][6] to develop countermeasures mainly against replay [7,8], speech synthesis [9,10] and voice conversion [10,11] attacks.…”

Section: Introductionmentioning

confidence: 99%

“…A separate detection countermeasure has the following advantages: 1) It separates the defense part and speaker verification into two independent stages, which avoids retraining a well-developed ASV model. 2) Since most existing countermeasures for replay and synthetic speech attacks are based on a separate detection network [7][8][9], the proposed approach provides the feasibility to develop a unified countermeasure against all spoofing attacks.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Zhong

et al. 2020

Preprint

View full text Add to dashboard Cite

Recently adversarial attacks on automatic speaker verification (ASV) systems attracted widespread attention as they pose severe threats to ASV systems. However, methods to defend against such attacks are limited. Existing approaches mainly focus on retraining ASV systems with adversarial data augmentation. Also, countermeasure robustness against different attack settings are insufficiently investigated. Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training. A VGG-like binary classification detector is introduced and demonstrated to be effective on detecting adversarial samples. To investigate detector robustness in a realistic defense scenario where unseen attack settings exist, we analyze various attack settings and observe that the detector is robust (6.27% EER det degradation in the worst case) against unseen substitute ASV systems, but it has weak robustness (50.37% EER det degradation in the worst case) against unseen perturbation methods. The weak robustness against unseen perturbation methods shows a direction for developing stronger countermeasures.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Zhong

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently, the Gaussian Mixture Model (GMM) classifier trained with constant Q cepstral coefficient (CQCC) feature has been the benchmark for various anti-spoofing tasks [13,14]. The CQCC feature is a perceptually-inspired time-frequency analysis extracted from a constant-Q transform (CQT) [15,16].…”

Section: Introductionmentioning

confidence: 99%

The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion

Cai

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, feature representation, classification, and fusion. First, we introduce an utterance-level deep learning framework for antispoofing. It receives the variable-length feature sequence and outputs the utterance-level scores directly. Based on the framework, we try out various kinds of input feature representations extracted from either the magnitude spectrum or phase spectrum. Besides, we also perform the data augmentation strategy by applying the speed perturbation on the raw waveform. Our best single system employs a residual neural network trained by the speed-perturbed group delay gram. It achieves EER of 1.04% on the development set, as well as EER of 1.08% on the evaluation set. Finally, using the simple average score from several single systems can further improve the performance. EER of 0.24% on the development set and 0.66% on the evaluation set is obtained for our primary system.

show abstract

“…We demonstrate the result using conventional MFCCs and newly proposed CQCCs features on GMM-maximum likelihood (GMM-ML) framework. It is found that GMM-ML as a classifier is better suited for spoofing detection task [19]. We have experimented on two recent databases: ASVspoof 2015, developed as a part of Automatic Speaker Verification Spoofing and Countermeasure Challenge [20] and BTAS 2016 corpus in Speaker Anti-spoofing Competition [21].…”

Section: Introductionmentioning

confidence: 99%

Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora

Paul

Sahidullah

Saha

2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Voice-based biometric systems are highly prone to spoofing attacks. Recently, various countermeasures have been developed for detecting different kinds of attacks such as replay, speech synthesis (SS) and voice conversion (VC). Most of the existing studies are conducted with a specific training set defined by the evaluation protocol. However, for realistic scenarios, selecting appropriate training data is an open challenge for the system administrator. Motivated by this practical concern, this work investigates the generalization capability of spoofing countermeasures in restricted training conditions where speech from a broad attack types are left out in the training database. We demonstrate that different spoofing types have considerably different generalization capabilities. For this study, we analyze the performance using two kinds of features, mel-frequency cepstral coefficients (MFCCs) which are considered as baseline and recently proposed constant Q cepstral coefficients (CQCCs). The experiments are conducted with standard Gaussian mixture model -maximum likelihood (GMM-ML) classifier on two recently released spoofing corpora: ASVspoof 2015 and BTAS 2016 that includes cross-corpora performance analysis. Featurelevel analysis suggests that static and dynamic coefficients of spectral features, both are important for detecting spoofing attacks in the real-life condition.

show abstract

Classifiers for synthetic speech detection: a comparison

Cited by 29 publications

References 28 publications

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion

Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora

Contact Info

Product

Resources

About