2022
DOI: 10.1109/taslp.2021.3138681
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

Abstract: As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using differ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 31 publications
0
7
0
Order By: Relevance
“…Todisco et al [21] proposed a Gaussian back-end fusion method that fuses the scores with log-likelihood ratio according to separately modeled Gaussian mixtures. Kanervisto et al [22] proposed a reinforcement learning paradigm to optimize tandem detection cost function (t-DCF) by jointly training a tandem ASV and CM system. Shim et al [23] proposed a fusion-based approach that takes the speaker embedding and CM prediction as input and weighs the ASV score, CM score, and their multiplication to make the final decision.…”
Section: Fusion-based Methodsmentioning
confidence: 99%
“…Todisco et al [21] proposed a Gaussian back-end fusion method that fuses the scores with log-likelihood ratio according to separately modeled Gaussian mixtures. Kanervisto et al [22] proposed a reinforcement learning paradigm to optimize tandem detection cost function (t-DCF) by jointly training a tandem ASV and CM system. Shim et al [23] proposed a fusion-based approach that takes the speaker embedding and CM prediction as input and weighs the ASV score, CM score, and their multiplication to make the final decision.…”
Section: Fusion-based Methodsmentioning
confidence: 99%
“…All other hyperparameters are the same for both front-ends which are both jointly optimised with the back-end classifier using back-propagation [68]. As is now common in the related literature [69,70], we performed each experiments with three runs using different random seeds to initialize the network weights and report the results of the best performing seed and average results. All models were trained for 100 epochs on a single GeForce RTX 3090 GPU and all results are reproducible using open source code 5 and with the same random seed and GPU environment.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Approaches to score-level fusion can be either parameter-free (e.g., score-sum ensemble) or parameter-driven (e.g., Gaussian mixture model) where both utilise separate scores from ASV and CM sub-systems [16]. Embedding-level fusion can also be achieved using a model operating upon embeddings that lie in different latent spaces [19][20][21]. Prior work includes Gomez-Alanis et al [20] and Shim et al [19] which both propose deep neural network (DNN)-based models to jointly optimise ASV and CM embeddings and hence produce single SASV scores and decisions.…”
Section: Related Workmentioning
confidence: 99%
“…Prior work includes Gomez-Alanis et al [20] and Shim et al [19] which both propose deep neural network (DNN)-based models to jointly optimise ASV and CM embeddings and hence produce single SASV scores and decisions. Kanervisto et al [21] reports a tandem solution to jointly optimise ASV and CM systems using reinforcement learning.…”
Section: Related Workmentioning
confidence: 99%