A Highly Adaptive Acoustic Model for Accurate Multi-dialect Speech Recognition

Yoo, Sanghyun; Song, Inchul; Bengio, Yoshua

doi:10.1109/icassp.2019.8683705

Cited by 28 publications

(17 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They pre-trained an attention-based encoder-decoder model to disentangle accentinvariant and accent-specific characteristics from acoustic features by adversarial training. Accent-dependent acoustic modeling approaches take accent-related information into network architecture by accent embedding, accent-specific bottleneck features or ivectors [20,21]. In a closed set of known accents, accent-dependent models usually outperform the accent-independent universal models, while the latter ones usually achieve a better average model under the situations where accent labels are unavailable.…”

Section: Related Workmentioning

confidence: 99%

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

Shi

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The variety of accents has posed a big challenge to speech recognition. The Accented English Speech Recognition Challenge (AESRC2020) is designed for providing a common testbed and promoting accent-related research. Two tracks are set in the challenge -English accent recognition (track 1) and accented English speech recognition (track 2). A set of 160 hours of accented English speech collected from 8 countries is released with labels as the training set. Another 20 hours of speech without labels is later released as the test set, including two unseen accents from another two countries used to test the model generalization ability in track 2. We also provide baseline systems for the participants. This paper first reviews the released dataset, track setups, baselines and then summarizes the challenge results and major techniques used in the submissions.

show abstract

Section: Related Workmentioning

confidence: 99%

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

Shi

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…In both cases using one-hot dialect codes as an input augmentation (corresponding to bias adaptation) proved to be the best approach, and cluster-adaptive approaches did not result in a consistent gain. These approaches were extended by Yoo et al [227] and Viglino et al [223] who both explored the use of dialect embeddings for multi-accent end-to-end speech recognition. Ghorbani et al [228] used accent-specific teacherstudent learning, and Jain et al [229] explored a mixture of experts (MoE) approach, using mixtures of experts both at the phonetic and accent levels.…”

Section: Accent Adaptationmentioning

confidence: 99%

“…Yoo et al [227] also applied a method of feature-wise affine transformations on the hidden layers (FiLM), that are dependent both on the network's internal state and the dialect/accent code (discussed in Sec. VI).…”

Section: Accent Adaptationmentioning

confidence: 99%

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Bell

Fainberg

Klejch

et al. 2021

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

“…Accent-robust ASR systems aim to mitigate the negative effects of non-native speech. A straightforward exploration is to build an accent-specific system where accent information, such as i-vectors, accent IDs, or accent embeddings, are explicitly fed into the neural networks along with acoustic features [5][6][7][8][9][10]. These approaches typically either adapt a unified model with accent-specific data, or build a separate decoder for each accent.…”

Section: Introductionmentioning

confidence: 99%

REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling

Yang

Raeesy

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accentinvariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions. Motivated by the proof of equivalence, we introduce reDAT, a novel technique based on DAT, which relabels data using either unsupervised clustering or soft labels. Experiments on 23K hours of multi-accent data show that DAT achieves competitive results over accent-specific baselines on both native and non-native English accents but up to 13% relative WER reduction on unseen accents; our reDAT yields further improvements over DAT by 3% and 8% relatively on non-native accents of American and British English.

show abstract

A Highly Adaptive Acoustic Model for Accurate Multi-dialect Speech Recognition

Cited by 28 publications

References 11 publications

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling

Contact Info

Product

Resources

About