ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414575
|View full text |Cite
|
Sign up to set email alerts
|

History Utterance Embedding Transformer LM for Speech Recognition

Abstract: History utterances contain rich contextual information; however, better extracting information from the history utterances and using it to improve the language model (LM) is still challenging. In this paper, we propose the history utterance embedding Transformer LM (HTLM), which includes an embedding generation network for extracting contextual information contained in the history utterances and a main Transformer LM for current prediction. In addition, the two-stage attention (TSA) is proposed to encode riche… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…This method is also used in GPT2 model compression. Language models are widely used in ASR task [ 46 ]. Combining LM with an end-to-end ASR model is common through shallow fusion [ 47 ] or cold fusion [ 48 ].…”
Section: Related Workmentioning
confidence: 99%
“…This method is also used in GPT2 model compression. Language models are widely used in ASR task [ 46 ]. Combining LM with an end-to-end ASR model is common through shallow fusion [ 47 ] or cold fusion [ 48 ].…”
Section: Related Workmentioning
confidence: 99%
“…LM plays an important part in ASR [23]. Previous works like shallow fusion [24] and cold fusion [25] aim to combine an auto-regressive LM with a S2S ASR model, which is randomly initialized.…”
Section: Related Workmentioning
confidence: 99%
“…To the best of our knowledge, for Mandarin Chinese, there is no public-available dialog speech dataset adequate for the current requirement of high quality. With the boom in popularity of voice-driven interfaces to devices recently, some works [22,23] concerned with communication scenes have been conducted. However, exploring speech processing techniques in dialog scenarios is still challenging.…”
Section: Introductionmentioning
confidence: 99%