2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003822
|View full text |Cite
|
Sign up to set email alerts
|

Monotonic Recurrent Neural Network Transducer and Decoding Strategies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 46 publications
(42 citation statements)
references
References 4 publications
0
42
0
Order By: Relevance
“…Decoding: All results are reported after decoding models using the breadth-first search decoding algorithm [15]. 2 The limited context models are always evaluated using the path-merging process proposed in Section 3.1.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Decoding: All results are reported after decoding models using the breadth-first search decoding algorithm [15]. 2 The limited context models are always evaluated using the path-merging process proposed in Section 3.1.…”
Section: Methodsmentioning
confidence: 99%
“…Traditional decoding algorithms for RNN-T [1,15] only produce trees that are rooted at the sos label since distinct label sequences result in unique model states (i.e., the state of the prediction network, since the encoder state is not conditioned on the label sequence). In a limited context model, however, model states are identical if two paths on the beam share the same local label history.…”
Section: Decoding With Path Merging To Create Latticesmentioning
confidence: 99%
See 1 more Smart Citation
“…We break the transducer lattice rule a little bit in decoding. One frame only outputs one phone label or blank [25].…”
Section: Phone Synchronous Decoding With Blank Skippingmentioning
confidence: 99%
“…This led to a rapidly evolving research landscape in end-to-end modeling for ASR with Recurrent Neural Network Transducers (RNN-T) [1] and attention-based models [2,3] being the most prominent examples. Attention based models are excellent at handling non-monotonic alignment problems such as translation [4], whereas RNN-Ts are an ideal match for the left-to-right nature of speech [5][6][7][8][9][10][11][12][13][14][15][16][17].…”
Section: Introductionmentioning
confidence: 99%