ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683828
|View full text |Cite
|
Sign up to set email alerts
|

Training Multi-task Adversarial Network for Extracting Noise-robust Speaker Embedding

Abstract: Under noisy environments, to achieve the robust performance of speaker recognition is still a challenging task. Motivated by the promising performance of multi-task training in a variety of image processing tasks, we explore the potential of multitask adversarial training for learning a noise-robust speaker embedding. In this paper, we present a novel framework that consists of three components: an encoder that extracts the noise-robust speaker embeddings; a classifier that classifies the speakers; a discrimin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 41 publications
(35 citation statements)
references
References 23 publications
(29 reference statements)
0
35
0
Order By: Relevance
“…This is the definition of path. Since p i,w i is the probability of output the w i -th element of V + { − } at time i, the probability of the path P can be calculated as Equation (19).…”
Section: Ctcmentioning
confidence: 99%
See 2 more Smart Citations
“…This is the definition of path. Since p i,w i is the probability of output the w i -th element of V + { − } at time i, the probability of the path P can be calculated as Equation (19).…”
Section: Ctcmentioning
confidence: 99%
“…However, at the same time, there are still many works using the ReLU activation F (x) = max{x, 0} [7,19,[22][23][24]27,28].…”
Section: Activationsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use CTC loss to train the AM so the network outputs can align with the phoneme sequences automatically and use cross-entropy loss to discriminate between dialects. Compared with multi-task training [28,29] in SV tasks, it should be emphasized that these stages should be trained step by step instead of multi-task learning with shared layers, that is to say we backpropagate the whole network while training AM, and only backpropagate the RNN part in the second stage, or the network will be degenerated and lost the information of acoustic knowledge.…”
Section: Loss Functionmentioning
confidence: 99%
“…Within this framework, there are two main methods. The first one regards the noisy data as a different domain from the clean data and applies adversarial training to deal with domain mismatch and get a noise-invariant speaker embedding [14,15]. The second method employs a DNN speech enhancement network for ASV tasks.…”
Section: Introductionmentioning
confidence: 99%