Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2798
|View full text |Cite
|
Sign up to set email alerts
|

Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models

Abstract: Recurrent Neural Networks, in particular Long-Short Term Memory (LSTM) networks, are widely used in Automatic Speech Recognition for language modelling during decoding, usually as a mechanism for rescoring hypothesis. This paper proposes a new architecture to perform real-time one-pass decoding using LSTM language models. To make decoding efficient, the estimation of look-ahead scores was accelerated by precomputing static look-ahead tables. These static tables were precomputed from a pruned n-gram model, redu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
17
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 21 publications
(22 citation statements)
references
References 24 publications
0
17
0
Order By: Relevance
“…In this section, we describe the annotation tagset used, the criteria applied and the annotation process carried out, including the Inter-Annotator Agreement tests conducted for the annotation of the VivesDebate corpus. The annotation task consists of three main subtasks: first, the annotators review and correct the transcriptions automatically obtained by the MLLP transcription system (https://ttp.mllp.upv.es/, accessed on 2 August 2021) [24], the IberSpeech-RTVE 2020 TV Speech-to-Text Challenge award winning transcription system developed by the Machine Learning and Language Processing (MLLP) research group of the VRAIN. Then, the Argumentative Discourse Units (ADUs) of each debate, which are the minimal units of analysis containing argumentative information, are identified and segmented.…”
Section: Annotation Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we describe the annotation tagset used, the criteria applied and the annotation process carried out, including the Inter-Annotator Agreement tests conducted for the annotation of the VivesDebate corpus. The annotation task consists of three main subtasks: first, the annotators review and correct the transcriptions automatically obtained by the MLLP transcription system (https://ttp.mllp.upv.es/, accessed on 2 August 2021) [24], the IberSpeech-RTVE 2020 TV Speech-to-Text Challenge award winning transcription system developed by the Machine Learning and Language Processing (MLLP) research group of the VRAIN. Then, the Argumentative Discourse Units (ADUs) of each debate, which are the minimal units of analysis containing argumentative information, are identified and segmented.…”
Section: Annotation Methodologymentioning
confidence: 99%
“…( The disagreements found are basically of two types: (a) the inclusion or omission of words at the beginning or at the end of the ADU (22) vs. (23); and (b) the segmentation of the same text into two ADUs or a single ADU (24) vs. ( 25), the latter being stronger disagreement than the former. For instance, one of the annotators considered 'is broken in a miserable way' to be a different ADU (24), whereas the other annotator considered this segment part of the same ADU Finally, we agreed that it should be annotated as a single ADU (25), because 'is broken' is the main verb of the sentence, and the argument is that what is broken is the bond between the mother and the baby.…”
mentioning
confidence: 99%
“…In order to further speed up the decoding process, specific LM pruning parameters had to be incorporated to the one-pass decoder, to reduce the search space or the number of queries in the computation of neural LM probabilities [26]. One of these parameters is the Language Model History Recombination (LMHR) which defines the number of words to be considered before performing hypothesis recombination during decoding.…”
Section: Lm Pruning Parametersmentioning
confidence: 99%
“…This work takes as a starting point a novel architecture for real-time one-pass decoding with LSTM-RNN LMs proposed in [26]. In it, one-pass decoding was accelerated by estimating look-ahead scores using precomputed static look-ahead tables.…”
Section: Introductionmentioning
confidence: 99%
“…Fol-lowing a cascade approach, a streaming ST setup can be achieved with individual streaming ASR and MT components. Advances in neural streaming ASR (Zeyer et al, 2016;Jorge et al, 2019 allow the training of streaming models whose performance is very similar to offline ones. Recent advances in simultaneous MT show promise (Arivazhagan et al, 2019;, but current models have additional modelling and training complexity, and are not ready for translation of long streams of input text.…”
Section: Introductionmentioning
confidence: 99%