2021
DOI: 10.1109/lsp.2020.3044547
|View full text |Cite
|
Sign up to set email alerts
|

Non-Autoregressive Transformer for Speech Recognition

Abstract: Recently very deep transformers start showing outperformed performance to traditional bi-directional long short-term memory networks by a large margin. However, to put it into production usage, inference computation cost and latency are still serious concerns in real scenarios. In this paper, we study a novel non-autoregressive transformers structure for speech recognition, which is originally introduced in machine translation. During training input tokens fed to the decoder are randomly replaced by a special … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
56
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 72 publications
(56 citation statements)
references
References 19 publications
(23 reference statements)
0
56
0
Order By: Relevance
“…The performance gains are higher when both ASR and MT models use the target units of the same granularity (BPE in our case). As all the models are transformers, they could be replaced with Non-Auto regressive models [19], [20], [21], which could reduce the latency of decoding. Along with the input symbols, a mechanism to present the confidence of the input symbol could help MT models to better translate the ASR hypothesis.…”
Section: Conclusion and Future Scopementioning
confidence: 99%
“…The performance gains are higher when both ASR and MT models use the target units of the same granularity (BPE in our case). As all the models are transformers, they could be replaced with Non-Auto regressive models [19], [20], [21], which could reduce the latency of decoding. Along with the input symbols, a mechanism to present the confidence of the input symbol could help MT models to better translate the ASR hypothesis.…”
Section: Conclusion and Future Scopementioning
confidence: 99%
“…Transformers rely on a simple but powerful procedure called attention, which focuses on certain parts of the input to get more efficient results. Currently, they are considered state-of-the-art models in sequential data, in particular natural language processing (NLP) methods such as machine translation [36], language modeling [37], and speech recognition [38]. The architecture of the Transformer developed by Vaswani et al [39] is based on the encoder-decoder model, which transforms a given sequence of elements into another sequence.…”
Section: Introductionmentioning
confidence: 99%
“…To speed up the inference of speech recognition, nonautoregressive transformers have been proposed [5,6,7,8], which generate all target tokens simultaneously [5,6] or iteratively [7,8]. We notice that the encoder of AR transformers and that of NAR transformers are the same thus the dif- † Work performed during an internship at Tencent AI Lab.…”
Section: Introductionmentioning
confidence: 99%
“…For example, the NAR decoders in listen attentively and spell once (LASO) [5] and listen and fill in missing letter (LFML) [8] take fixed-length sequence filled with MASK tokens as input to predict target sequence:…”
Section: Introductionmentioning
confidence: 99%