Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2526
|View full text |Cite
|
Sign up to set email alerts
|

Relative Positional Encoding for Speech Recognition and Direct Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 24 publications
(30 citation statements)
references
References 0 publications
0
30
0
Order By: Relevance
“…Modeling The main architecture is the deep Transformer (Vaswani et al, 2017) with stochastic layers (Pham et al, 2019b). The encoder self attention layer uses Bidirectional relative attention (Pham et al, 2020a) which models the relative distance between one position and other positions in the sequence. This modeling is bidirectional because the distance is distinguished for each direction from the perspective of one particular position.…”
Section: End-to-end Modelmentioning
confidence: 99%
“…Modeling The main architecture is the deep Transformer (Vaswani et al, 2017) with stochastic layers (Pham et al, 2019b). The encoder self attention layer uses Bidirectional relative attention (Pham et al, 2020a) which models the relative distance between one position and other positions in the sequence. This modeling is bidirectional because the distance is distinguished for each direction from the perspective of one particular position.…”
Section: End-to-end Modelmentioning
confidence: 99%
“…Due to the non-sequential modeling of the original self attention modules, the vanilla Transformer employs the position embedding by a deterministic sinusoidal function to indicate the absolute position of each input element (Vaswani et al, 2017). However, this scheme is far from ideal for acoustic modeling (Pham et al, 2020 The latest work (Pham et al, 2020;Gulati et al, 2020) points out that the relative position encoding enables the model to generalize better for the unseen sequence lengths. It yields a significant improvement on the acoustic modeling tasks.…”
Section: Relative Position Encodingmentioning
confidence: 99%
“…Relative positional encoding [12,13] is an extension of an absolute positional encoding technique that allows self-attention to handle relative positional information. The absolute positional encoding is defined as follows:…”
Section: Positional Encodingmentioning
confidence: 99%
“…To solve this problem, several studies have been proposed. Masking [11] limits the range of self-attention by using a Gaussian window, whereas relative positional encoding [12,13] uses relative embedding in a self-attention architecture to eliminate the effect of the length mismatch. However, masking does not take into account the correlation between input features and relative distance.…”
Section: Introductionmentioning
confidence: 99%