ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746316
|View full text |Cite
|
Sign up to set email alerts
|

Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…In addition, transformers offer the advantage of parallelising computations, enabling faster training of deeper models on larger datasets. Recently, language models have shown their power in capturing high-level, long-term patterns across different data types including text [21,96] and image [157,158], and speech [159][160][161]. This has also opened avenues for developing large language models in the speech and audio domain.…”
Section: Automatic Speech Recognition (Asr)mentioning
confidence: 99%
“…In addition, transformers offer the advantage of parallelising computations, enabling faster training of deeper models on larger datasets. Recently, language models have shown their power in capturing high-level, long-term patterns across different data types including text [21,96] and image [157,158], and speech [159][160][161]. This has also opened avenues for developing large language models in the speech and audio domain.…”
Section: Automatic Speech Recognition (Asr)mentioning
confidence: 99%
“…Because of the success, previous studies have investigated the pre-trained language model to enhance the performance of ASR. On the one hand, several studies directly leverage a pre-trained language model as a portion of the ASR model [13,14,15,16,17,18,19]. Although such designs are straightforward, they can obtain satisfactory performances.…”
Section: Related Workmentioning
confidence: 99%
“…The most straightforward method is to employ them as an acoustic feature encoder and then stack a simple layer of neural network on top of the encoder to do speech recognition [9]. After that, some studies present various cascade methods to concatenate pre-trained language and speech representation learning models for ASR [14,15,17,18]. Although these methods have proven their capabilities and effectiveness on benchmark corpora, their complicated model architectures and/or large-scaled model parameters have usually made them hard to be used in practice.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Non-autoregressive speech processing is first used in [18]. After that, many more non-autoregressive methods are proposed [19][20][21][22][23][24][25]. Among the methods, there are two that are appropriate to achieve non-autoregressive spell correction.…”
Section: Introductionmentioning
confidence: 99%