Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2868
|View full text |Cite
|
Sign up to set email alerts
|

Open-Set Short Utterance Forensic Speaker Verification Using Teacher-Student Network with Explicit Inductive Bias

Abstract: In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification. The objective… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…In this section, we compare the proposed lightweight method to six state-of-the-art methods for lightweight SV, including the ECAPA-TDNNLite [50], EfficientTDNN [51], KD-based [52], Thin-ResNet34 [64], Fast-ResNet34 [65], and CSTCTS1dConv (Channel Split Time-Channel-Time Separable 1-dimensional Convolution) [66]. The ECAPA-TDNNLite based method [50] is a lightweight version of the ECAPA-TDNN based method, in which a large model, ECAPA-TDNN, is utilized for enrollment and a small model, ECAPA-TDNNLite, is used for verification.…”
Section: Comparison Of Different Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we compare the proposed lightweight method to six state-of-the-art methods for lightweight SV, including the ECAPA-TDNNLite [50], EfficientTDNN [51], KD-based [52], Thin-ResNet34 [64], Fast-ResNet34 [65], and CSTCTS1dConv (Channel Split Time-Channel-Time Separable 1-dimensional Convolution) [66]. The ECAPA-TDNNLite based method [50] is a lightweight version of the ECAPA-TDNN based method, in which a large model, ECAPA-TDNN, is utilized for enrollment and a small model, ECAPA-TDNNLite, is used for verification.…”
Section: Comparison Of Different Methodsmentioning
confidence: 99%
“…Additionally, some researchers applied the techniques of Knowledge Distillation (KD) [46], [47] and Neural Architecture Search (NAS) [48] to implement lightweight SV [49]- [52]. In the work of [49], the strategy of teacher-student training was proposed for text-independent SV, and competitive error rate with 88-93% smaller models was obtained.…”
Section: Related Workmentioning
confidence: 99%
“…They trained the model to decrease the distances between speaker embeddings extracted from the same speaker utterances and increase the distances between speaker embeddings of different speakers. In addition to these loss functions, the prior studies have investigated diverse methods, such as data augmentation [17,18], network architectures [19,20], and system frameworks [21,22].…”
Section: Related Workmentioning
confidence: 99%
“…For input-level, models usually can be adapted by training with enhanced [8] or domain-translated [9] input features. For adaptation at embedding-level, it often targets at minimizing certain distances between source and target domains to align them in the same embedding space, such as cosine distance [10], mean squared error (MSE) [11], and maximum mean discrepancy (MMD) [12]. However, this method usually requires parallel or artificial simulated data, which cannot generalize well to real-world scenarios.…”
Section: Introductionmentioning
confidence: 99%