Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-879
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
107
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 158 publications
(108 citation statements)
references
References 9 publications
1
107
0
Order By: Relevance
“…Each sub-network contains a block of five convolutional layers as the basic feature extraction trunk (these are shared for both content and identity, as it has been speculated that lower level features, e.g. edges for images and formants for speech, are likely to be common [26] for different high level tasks). Both sub-networks are based on the VGG-M architecture [27] which strikes a good trade-off between efficiency and performance.…”
Section: Network Architecturementioning
confidence: 99%
“…Each sub-network contains a block of five convolutional layers as the basic feature extraction trunk (these are shared for both content and identity, as it has been speculated that lower level features, e.g. edges for images and formants for speech, are likely to be common [26] for different high level tasks). Both sub-networks are based on the VGG-M architecture [27] which strikes a good trade-off between efficiency and performance.…”
Section: Network Architecturementioning
confidence: 99%
“…By training a discriminator, parameterized by θ D , to ascertain the domain of the generated features, an adversarial penalty is added to the overall loss function of a domain adversarial neural network (DANN) [9,10]:…”
Section: Channel Adversarial Trainingmentioning
confidence: 99%
“…Recently, some adversarial training methods are introduced to extract noise invariant bottleneck features [64,188]. As shown in Figure 12, the adversarial network includes two parts, i.e., an encoding network (EN) which can extract noise invariant features and a discriminative network (DN) which can judge noise types of the noise invariant feature generated from EN.…”
Section: Speech Recognition and Verification For The Internet Ofmentioning
confidence: 99%
“…As shown in Figure 12, the adversarial network includes two parts, i.e., an encoding network (EN) which can extract noise invariant features and a discriminative network (DN) which can judge noise types of the noise invariant feature generated from EN. Therefore, we can get robustness noise invariant features from EN which can improve the performance of speaker verification system by adversarial training these two parts in turn [64,188].…”
Section: Speech Recognition and Verification For The Internet Ofmentioning
confidence: 99%