2023
DOI: 10.48550/arxiv.2302.06218
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies

Abstract: Nomenclature ΣCovariance matrix G Gram/kernel matrix k(•)Kernel function P(•) Probability density P(•)Token mixing process Re(•) Function that extracts the real component of a complex numberElement at ith position of column vector a A * :jColumn vector in jth row of A A i,jElement in ith row jth column ofmatrix of the embedding dimension F s L×L Vandermonde matrix of the sequence dimension W Weight matix learned with element-wise non-linearity (e.g., ReLU, GELU) W C L×L Weight matix of a single convolution ker… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 35 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?