ASR Error Correction with Augmented Transformer for Entity Retrieval

Wang, Haoyu; Dong, Shuyan; Liu, Yue; Logan, James; Agrawal, Ashish; Liu, Yang

doi:10.21437/interspeech.2020-1753

Cited by 31 publications

(28 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To quantify model robustness under noisy settings, we augmented ATIS and SNIP-Multi with environmental noise from MS-SNSD, which is a common scenario where users utter their spoken commands. The incentive here is how 'noise' was abused in some SLU literature, where ASR errors were treated as the noise source instead of modeling error, see [29,61]. Results on noisy test reveal that those work well on ATIS or SNIPS may break under realistic noises.…”

Section: Main Results On Clean and Noisy Slumentioning

confidence: 99%

Towards Semi-Supervised Semantics Understanding from Speech

Lai¹,

Cao²,

Bodapati³

et al. 2020

Preprint

View full text Add to dashboard Cite

Much recent work on Spoken Language Understanding (SLU) falls short in at least one of three ways: models were trained on oracle text input and neglected the Automatics Speech Recognition (ASR) outputs, models were trained to predict only intents without the slot values, or models were trained on a large amount of inhouse data. We proposed a clean and general framework to learn semantics directly from speech with semi-supervision from transcribed speech to address these. Our framework is built upon pretrained end-to-end (E2E) ASR and self-supervised language models, such as BERT, and fine-tuned on a limited amount of target SLU corpus. In parallel, we identified two inadequate settings under which SLU models have been tested: noise-robustness and E2E semantics evaluation. We tested the proposed framework under realistic environmental noises and with a new metric, the slots edit F 1 score, on two public SLU corpora. Experiments show that our SLU framework with speech as input can perform on par with those with oracle text as input in semantics understanding, while environmental noises are present, and a limited amount of labeled semantics data is available. * Work performed during an internship at Amazon AI. † Corresponding author. 3 SLU typically consists of Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). ASR maps audio to text, and NLU maps text to semantics. Here, we are interested in learning a mapping directly from raw audio to semantics. 4 Semantics is commonly formulated as intent and slots in common benchmarking datasets like ATIS.

show abstract

Section: Main Results On Clean and Noisy Slumentioning

confidence: 99%

Towards Semi-Supervised Semantics Understanding from Speech

Lai¹,

Cao²,

Bodapati³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…ASR error correction task is a well studied problem in literature and usually, it has been treated as a post processing task along with other tasks like punctuation prediction [15,16] and inverse text normalization [17]. The prior works have explored the problem using a variety of subtasks including grammar error correction, improving human readability [18], entity retrieval [19] etc. In [20], the authors use an RNN based external language model along with a stacked RNN based seq2seq spelling correction model for improving a baseline Listen Attend and Spell (LAS) based ASR system.…”

Section: Related Workmentioning

confidence: 99%

“…In [19], an Augmented Transformer model is proposed which leverages phonetic along with text for correcting ASR outputs. They show that jointly encoding both phoneme and text information helps in improving entity retrieval compared to a vanilla text transformer.…”

Section: Related Workmentioning

confidence: 99%

Remember the context! ASR slot error correction through memorization

Bekal¹,

Shenoy²,

Sunkara³

et al. 2021

Preprint

View full text Add to dashboard Cite

Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems. Although it is a critical step with direct impact on downstream tasks such as language understanding, many domain agnostic ASR systems tend to perform poorly on domain specific or long tail words. They are often supplemented with slot error correcting systems but it is often hard for any neural model to directly output such rare entity words. To address this problem, we propose k-nearest neighbor (k-NN) search that outputs domain-specific entities from an explicit datastore. We improve error correction rate by conveniently augmenting a pretrained joint phoneme and text based transformer sequence to sequence model with k-NN search during inference. We evaluate our proposed approach on five different domains containing long tail slot entities such as full names, airports, street names, cities, states. Our best performing error correction model shows a relative improvement of 7.4% in word error rate (WER) on rare word entities over the baseline and also achieves a relative WER improvement of 9.8% on an out of vocabulary (OOV) test set.

show abstract

“…Grammatical Error Correction (GEC) aims to automatically detect and correct the grammatical errors that can be found in a sentence (Wang et al, 2020c). It is a crucial and essential application task in many natural language processing scenarios such as writing assistant (Ghufron and Rosyida, 2018;Napoles et al, 2017;Omelianchuk et al, 2020), search engine (Martins and Silva, 2004;Gao et al, 2010;Duan and Hsu, 2011), speech recognition systems (Karat et al, 1999;Wang et al, 2020a;Kubis et al, 2020), etc. Grammatical errors may appear in all languages (Dale et al, 2012;Xing et al, 2013;Ng et al, 2014;Rozovskaya et al, 2015;Bryant et al, 2019), in this paper, we only focus to tackle the problem of Chinese Grammatical Error Correction (CGEC) (Chang, 1995).…”

Section: Introductionmentioning

confidence: 99%

Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction

Li¹,

Shi²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

We investigate the problem of Chinese Grammatical Error Correction (CGEC) and present a new framework named Tail-to-Tail (TtT) non-autoregressive sequence prediction to address the deep issues hidden in CGEC. Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected based on the bidirectional context information, thus we employ a BERTinitialized Transformer Encoder as the backbone model to conduct information modeling and conveying. Considering that only relying on the same position substitution cannot handle the variable-length correction cases, various operations such substitution, deletion, insertion, and local paraphrasing are required jointly. Therefore, a Conditional Random Fields (CRF) layer is stacked on the up tail to conduct non-autoregressive sequence prediction by modeling the token dependencies. Since most tokens are correct and easily to be predicted/conveyed to the target, then the models may suffer from a severe class imbalance issue. To alleviate this problem, focal loss penalty strategies are integrated into the loss functions. Moreover, besides the typical fix-length error correction datasets, we also construct a variable-length corpus to conduct experiments. Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure on tasks of error Detection and Correction 1 .

show abstract

ASR Error Correction with Augmented Transformer for Entity Retrieval

Cited by 31 publications

References 11 publications

Towards Semi-Supervised Semantics Understanding from Speech

Towards Semi-Supervised Semantics Understanding from Speech

Remember the context! ASR slot error correction through memorization

Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction

Contact Info

Product

Resources

About