Frustratingly Easy Noise-aware Training of Acoustic Models

Raj, Desh; Villalba, Jesus; Povey, Daniel; Khudanpur, Sanjeev

doi:10.48550/arxiv.2011.02090

Cited by 1 publication

(2 citation statements)

References 20 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [20], the invariant representation learning technique was proposed, which demonstrated significant reduction in character error rate and robustness for out-ofdomain noise settings. In [21], a simple method was considered to extract a noise vector for acoustic model training. It is suggested that the technique could also be applied in online ASR by estimating the mean vector with frame-level maximum likelihood.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps & CHiME-4 Corpora

Chen¹,

Xia²,

Hansen³

2021

Preprint

View full text Add to dashboard Cite

In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This strategy is then applied to the CHiME-4 corpus and CRSS-UTDallas Fearless Steps Corpus, with emphasis on the 100-hour challenge corpus which consists of 5 selected NASA Apollo-11 channels. An analysis of the extracted embeddings provides the foundation needed to characterize training utterances into distinct groups based on acoustic distinguishing properties. Moreover, we also demonstrate that triplet-loss based embedding performs better than i-Vector in acoustic modeling, confirming that the triplet loss is more effective than a speaker feature. With additional techniques such as pronunciation and silence probability modeling, plus multi-style training, we achieve a +5.42% and +3.18% relative WER improvement for the development and evaluation sets of the Fearless Steps Corpus. To explore generalization, we further test the same technique on the 1 channel track of CHiME-4 and observe a +11.90% relative WER improvement for real test data.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Factor aware training has been shown to be effective in ASR system development [5,19,21]. This training strategy produces a system that is more robust to factors such as noise, speaker, and room characteristics.…”

Section: Scenario Awarementioning

confidence: 99%