Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1403
|View full text |Cite
|
Sign up to set email alerts
|

Output-Gate Projected Gated Recurrent Unit for Speech Recognition

Abstract: In this paper, we describe the work on accelerating decoding speed while improving the decoding accuracy. Firstly, we propose an architecture which we call Projected Gated Recurrent Unit (PGRU) for automatic speech recognition (ASR) tasks, and show that the PGRU could outperform the standard GRU consistently. Secondly, in order to improve the PGRU's generalization, especially for large-scale ASR task, the Output-gate PGRU (OPGRU) is proposed. Finally, time delay neural network (TDNN) and normalization skills a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 17 publications
0
11
0
1
Order By: Relevance
“…The details of baseline models in the experiments are shown in Table 4. The network architecture of Time Delay Neural Network-Long Short-Term Memory with Projection(TDNN-LSTMP), Bidirectional Long Short-Term Memory with Projection (BLSTMP), and Time Delay Neural Network-Output gate Projected Gated Recurrent Unit(TDNN-OPGRU) are described in [20] [21] [22], respectively. As [24] does not give the details of the Time Delay Neural Networkminimal Gated Recurrent Unit (TDNN-mGRU), when we set up the architecture of TDNN-mGRU, the parameters of TDNN are set as following the context module parameters of mGRUIP-Ctx [32], and the cell dimension is consistent with the setting of LSTMP in TDNN-LSTMP [20].…”
Section: Comparison Between Norm-pmgru and Baseline Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…The details of baseline models in the experiments are shown in Table 4. The network architecture of Time Delay Neural Network-Long Short-Term Memory with Projection(TDNN-LSTMP), Bidirectional Long Short-Term Memory with Projection (BLSTMP), and Time Delay Neural Network-Output gate Projected Gated Recurrent Unit(TDNN-OPGRU) are described in [20] [21] [22], respectively. As [24] does not give the details of the Time Delay Neural Networkminimal Gated Recurrent Unit (TDNN-mGRU), when we set up the architecture of TDNN-mGRU, the parameters of TDNN are set as following the context module parameters of mGRUIP-Ctx [32], and the cell dimension is consistent with the setting of LSTMP in TDNN-LSTMP [20].…”
Section: Comparison Between Norm-pmgru and Baseline Modelsmentioning
confidence: 99%
“…Thus, considering the ability to model a dynamic window of all sequence history instead of a fixed contextual window over the input sequence as FFNN does, RNN is more suitable for sequence modeling. However, as problems of vanishing gradient and exploding gradient highly happen in the vanilla RNN training process, Long Short-Term Memory (LSTM) [18], Gated Recurrent Unit (GRU) [19] and their Long Short-Term Memory with Projection (LSTMP) [20], Bidirectional Long Short-Term Memory with Projection (BLSTMP) [21], Output-Gate Projected Gated Recurrent Unit (OPGRU) [22], minimal Gated Recurrent Unit (mGRU, also named LiGRU) [23] [24], mGRUIP with Context module (mGRUIP-Ctx) [25], are proposed to solve the above problems and well used in speech recognition field. In [21], the author compared the performance of TDNN-LSTMP and BLSTMP.…”
Section: Introductionmentioning
confidence: 99%
“…4. Various network architectures were used through combinations of different types of layers, including TDNN, LSTM, Output-gate Projected Gated Recurrent Unit(BOPGRU) [17], CNN, TDNNF [18], Residual Bidirectional LSTM(RBiLSTM) [19] and other techniques, such as self-attention mechanism [20] and backstitch [21]. We evaluated different model architectures on the development set.…”
Section: Am Constructionmentioning
confidence: 99%
“…CNN-TDNN-RBiLSTM( Fig.2c): the architecture was proposed in [17] which backward (b)-LSTM was applied on top of the forward(f)-LSTM and directly appending the outputs of f-LSTM and b-LSTM (Fig.2b).…”
Section: Am Constructionmentioning
confidence: 99%
“…LF-MMI and LF-bMMI were adopted in [13] and [14] respectively on English conversation transcription and multi-talker speech recognition task but in both work cross-entropy pre-training is still required. Later on, various model architectures [15,16,17] for LF-MMI have been explored. LF-sMBR was proposed and compared against lattice-based sMBR in [18] when initialized from a LF-MMI trained model.…”
Section: Introductionmentioning
confidence: 99%