2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003935
|View full text |Cite
|
Sign up to set email alerts
|

Self-Adaptive Soft Voice Activity Detection Using Deep Neural Networks for Robust Speaker Verification

Abstract: Voice activity detection (VAD), which classifies frames as speech or non-speech, is an important module in many speech applications including speaker verification. In this paper, we propose a novel method, called self-adaptive soft VAD, to incorporate a deep neural network (DNN)-based VAD into a deep speaker embedding system. The proposed method is a combination of the following two approaches. The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 16 publications
(19 citation statements)
references
References 22 publications
(43 reference statements)
0
19
0
Order By: Relevance
“…The pooled vector is passed to one or few fully-connected (FC) layers to generate the deep speaker embedding z. The works in [14], [19], [20], [33] are examples of this approach.…”
Section: Deep Speaker Embedding Learningmentioning
confidence: 99%
See 4 more Smart Citations
“…The pooled vector is passed to one or few fully-connected (FC) layers to generate the deep speaker embedding z. The works in [14], [19], [20], [33] are examples of this approach.…”
Section: Deep Speaker Embedding Learningmentioning
confidence: 99%
“…To improve the robustness of the SV model to long nonspeech segments, we proposed self-adaptive soft VAD (SAS-VAD) [33], which is the combination of soft VAD and selfadaptive VAD. Here, we introduce the advanced version of SAS-VAD which shows better performance than the original one and can be combined with the MSA to achieve our ultimate goal.…”
Section: Self-adaptive Soft Voice Activity Detectionmentioning
confidence: 99%
See 3 more Smart Citations