2021
DOI: 10.48550/arxiv.2103.03457
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

IOT: Instance-wise Layer Reordering for Transformer Structures

Jinhua Zhu,
Lijun Wu,
Yingce Xia
et al.

Abstract: With sequentially stacked self-attention, (optional) encoder-decoder attention, and feed-forward layers, Transformer achieves big success in natural language processing (NLP), and many variants have been proposed. Currently, almost all these models assume that the layer order is fixed and kept the same across data samples. We observe that different data samples actually favor different orders of the layers. Based on this observation, in this work, we break the assumption of the fixed layer order in Transformer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 22 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?