2021
DOI: 10.48550/arxiv.2102.11090
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Position Information in Transformers: An Overview

Abstract: Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reorderings of the input. However, language is inherently sequential and word order is essential to the semantics and syntax of an utterance. In this paper, we provide an overview of common methods to incorporate position information into Transformer models. The objectives of this survey are to i) showcase that position information in Transformer is a vibrant and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(16 citation statements)
references
References 20 publications
0
16
0
Order By: Relevance
“…However, a number of works demonstrates that such permutation has little to no impact during the pre-training and finetuning stages (Pham et al, 2020;Sinha et al, 2020Sinha et al, , 2021O'Connor and Andreas, 2021;Hessel and Schofield, 2021;Gupta et al, 2021). The latter contradict the common understanding on how the hierarchical and structural information is encoded in LMs (Rogers et al, 2020), and even may question if the word order is modeled with the position embeddings Dufter et al, 2021).…”
Section: Introductionmentioning
confidence: 90%
See 1 more Smart Citation
“…However, a number of works demonstrates that such permutation has little to no impact during the pre-training and finetuning stages (Pham et al, 2020;Sinha et al, 2020Sinha et al, , 2021O'Connor and Andreas, 2021;Hessel and Schofield, 2021;Gupta et al, 2021). The latter contradict the common understanding on how the hierarchical and structural information is encoded in LMs (Rogers et al, 2020), and even may question if the word order is modeled with the position embeddings Dufter et al, 2021).…”
Section: Introductionmentioning
confidence: 90%
“…Various PEs have been proposed to utilize the information about word order in the Transformer-based LMs Dufter et al, 2021). Surprisingly, little is known about what PEs capture and how well they learn the meaning of positions.…”
Section: Positional Encodingmentioning
confidence: 99%
“…To impose spatial biases, we found conventional positional embeddings do not form meaningful biases, and use a relative position bias [9,24] instead. The bias is a matrix B ∈ R (2r+1)×(2r+1) , added to the computed attention, where r is the radius specifying the local range of the bias.…”
Section: Semantic Smoothing Transformermentioning
confidence: 99%
“…These architectures integrate structural and positional attributes of data when building abstract feature representations. For instances, ConvNets intrinsically consider regular spatial structure for the position of pixels, RNNs also build on the sequential structure of the word positions, and Transformers employ positional encoding of words (see Dufter et al (2021) for a review). For GNNs, the position of nodes is more challenging due to the fact that there does not exist a canonical positioning of nodes in arbitrary graphs.…”
Section: B2 Graph Positional Encodingmentioning
confidence: 99%