Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition

Shinohara, Yusuke

doi:10.21437/interspeech.2016-879

Cited by 158 publications

(108 citation statements)

References 9 publications

Supporting

Mentioning

107

Contrasting

Order By: Relevance

“…Each sub-network contains a block of five convolutional layers as the basic feature extraction trunk (these are shared for both content and identity, as it has been speculated that lower level features, e.g. edges for images and formants for speech, are likely to be common [26] for different high level tasks). Both sub-networks are based on the VGG-M architecture [27] which strikes a good trade-off between efficiency and performance.…”

Section: Network Architecturementioning

confidence: 99%

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision

Nagrani

Chung

Albanie

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The objective of this paper is to learn representations of speaker identity without access to manually annotated data. To do so, we develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces and audio in video. The key idea behind our approach is to tease apart-without annotation-the representations of linguistic content and speaker identity. We construct a two-stream architecture which: (1) shares low-level features common to both representations; and (2) provides a natural mechanism for explicitly disentangling these factors, offering the potential for greater generalisation to novel combinations of content and identity and ultimately producing speaker identity representations that are more robust.We train our method on a large-scale audio-visual dataset of talking heads 'in the wild', and demonstrate its efficacy by evaluating the learned speaker representations for standard speaker recognition performance.

show abstract

Section: Network Architecturementioning

confidence: 99%

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision

Nagrani

Chung

Albanie

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…By training a discriminator, parameterized by θ D , to ascertain the domain of the generated features, an adversarial penalty is added to the overall loss function of a domain adversarial neural network (DANN) [9,10]:…”

Section: Channel Adversarial Trainingmentioning

confidence: 99%

Channel Adversarial Training for Speaker Verification and Diarization

Luu

Bell

Renals

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong. We propose a training strategy which aims to produce features that are invariant at the granularity of the recording or channel, a finer grained objective than dataset-or environmentinvariance. By training an adversary to predict whether pairs of same-speaker embeddings belong to the same recording in a Siamese fashion, learned features are discouraged from utilizing channel information that may be speaker discriminative during training. Experiments for verification on Vox-Celeb and diarization and verification on CALLHOME show promising improvements over a strong baseline in addition to outperforming a dataset-adversarial model. The VoxCeleb model in particular performs well, achieving a 4% relative improvement in EER over a Kaldi baseline, while using a similar architecture and less training data.

show abstract

“…Recently, some adversarial training methods are introduced to extract noise invariant bottleneck features [64,188]. As shown in Figure 12, the adversarial network includes two parts, i.e., an encoding network (EN) which can extract noise invariant features and a discriminative network (DN) which can judge noise types of the noise invariant feature generated from EN.…”

Section: Speech Recognition and Verification For The Internet Ofmentioning

confidence: 99%

“…As shown in Figure 12, the adversarial network includes two parts, i.e., an encoding network (EN) which can extract noise invariant features and a discriminative network (DN) which can judge noise types of the noise invariant feature generated from EN. Therefore, we can get robustness noise invariant features from EN which can improve the performance of speaker verification system by adversarial training these two parts in turn [64,188].…”

Section: Speech Recognition and Verification For The Internet Ofmentioning

confidence: 99%

A Survey on Machine Learning‐Based Mobile Big Data Analysis: Challenges and Applications

Xie

Song

et al. 2018

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

This paper attempts to identify the requirement and the development of machine learning-based mobile big data (MBD) analysis through discussing the insights of challenges in the mobile big data. Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently applied data analysis methods are reviewed. Three typical applications of MBD analysis, namely, wireless channel modeling, human online and offline behavior analysis, and speech recognition in the Internet of Vehicles, are introduced, respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis.

show abstract

Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition

Cited by 158 publications

References 9 publications

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision

Channel Adversarial Training for Speaker Verification and Diarization

A Survey on Machine Learning‐Based Mobile Big Data Analysis: Challenges and Applications

Contact Info

Product

Resources

About