2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003857
|View full text |Cite
|
Sign up to set email alerts
|

Generalized Large-Context Language Models Based on Forward-Backward Hierarchical Recurrent Encoder-Decoder Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…This section details our proposed large-context knowledge distillation method as an effective training method of large-context E2E-ASR models. Our key idea is to mimic the behavior of a large-context language model [9][10][11][12][13] pre-trained from the same training datasets. A large-context language model defines the generation probability of a sequence of utterance-level texts W = {W1, • • • , WT } as…”
Section: Large-context Knowledge Distillationmentioning
confidence: 99%
See 1 more Smart Citation
“…This section details our proposed large-context knowledge distillation method as an effective training method of large-context E2E-ASR models. Our key idea is to mimic the behavior of a large-context language model [9][10][11][12][13] pre-trained from the same training datasets. A large-context language model defines the generation probability of a sequence of utterance-level texts W = {W1, • • • , WT } as…”
Section: Large-context Knowledge Distillationmentioning
confidence: 99%
“…In the decoder, both the continuous representations produced by the hierarchical transformer and input speech contexts are simultaneously taken into consideration using two multi-head source-target attention layers. Moreover, since it is difficult to effectively exploit the large-contexts beyond utterance boundaries, we also propose a large-context knowledge distillation method using a large-context language model [9][10][11][12][13]. This method enables our large-context E2E-ASR model to use the largecontexts beyond utterance boundaries by mimicking the behavior of the pre-trained large-context language model.…”
Section: Introductionmentioning
confidence: 99%
“…Other studies have shown that large-context end-to-end methods offer a superior performance to utterance-level or sentence-level end-to-end methods in automatic speech recognition [25][26][27], machine translation [28][29][30], and response generation for dialogue systems [31,32]. Furthermore, large-context language models that can consider not only past but also future contexts have been presented [19]. In this paper, we utilize large-context language models for self-supervised learning specialized to conversational documents.…”
Section: Related Workmentioning
confidence: 99%
“…Our concept is to estimate an utterance by using all the surrounding utterances. To this end, we introduce a novel large-context language model, which is an extended model of the forward-backward hierarchical recurrent encoder-decoder [19], so that we can estimate not only linguistic information but also speaker information. After performing the self-supervised learning, we utilize the pre-trained network for building state-of-the-art utterancelevel sequential labeling based on hierarchical bidirectional long short-term memory recurrent neural network conditional random fields (H-BLSTM-CRF) [6,7].…”
Section: Introductionmentioning
confidence: 99%
“…Representative methods are used to distill knowledge from an external language model to improve the capturing of linguistic contexts [24,25]. Our proposed large-context knowledge distillation method is regarded as an extension of the latter methods to enable the capturing of all preceding linguistic contexts beyond utterance boundaries using large-context language models [9][10][11][12][13].…”
Section: Related Workmentioning
confidence: 99%