2023
DOI: 10.48550/arxiv.2302.07253
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Energy Transformer

Abstract: Transformers have become the de facto models of choice in machine learning, typically leading to impressive performance on many applications. At the same time, the architectural development in the transformer world is mostly driven by empirical findings, and the theoretical understanding of their architectural building blocks is rather limited. In contrast, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, but have not yet demonstrated truly impressive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…The connection to statistical physics models was furthered with the development of fully energy-based transformers ( 23 ). The new energy function for the softmax form is …”
Section: Connecting Hopfield Network To Transformersmentioning
confidence: 99%
“…The connection to statistical physics models was furthered with the development of fully energy-based transformers ( 23 ). The new energy function for the softmax form is …”
Section: Connecting Hopfield Network To Transformersmentioning
confidence: 99%