2021
DOI: 10.48550/arxiv.2110.03303
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Universal Approximation Under Constraints is Possible with Transformers

Abstract: Many practical problems need the output of a machine learning model to satisfy a set of constraints, K. There are, however, no known guarantees that classical neural networks can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any convex or non-convex compact set K and any continuous function f : R n → K, there is a probabilistic transformer F whose randomized outputs all lie in K and whose e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…Recently, an increasing number of researchers begin to explore the representation power of the transformer in 3D (Liu et al, 2019a;Fuchs et al, 2020;Misra et al, 2021;Mao et al, 2021;Sander et al, 2022) Another important line of work seeks to theoretically demonstrate the representation power of the transformer by showing the universal approximation of continuous sequence-tosequence functions (Yun et al, 2019;Zaheer et al, 2020;Shi et al, 2021;Kratsios et al, 2021). To be specific, Yun et al (2019) demonstrated the universal approximation property of the transformer; Yun et al (2020) and Zaheer et al (2020) demonstrated that the transformer with sparse attention matrix remains a universal approximator; Shi et al (2021) claimed that the transformer without diag-attention is still a universal approximator.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, an increasing number of researchers begin to explore the representation power of the transformer in 3D (Liu et al, 2019a;Fuchs et al, 2020;Misra et al, 2021;Mao et al, 2021;Sander et al, 2022) Another important line of work seeks to theoretically demonstrate the representation power of the transformer by showing the universal approximation of continuous sequence-tosequence functions (Yun et al, 2019;Zaheer et al, 2020;Shi et al, 2021;Kratsios et al, 2021). To be specific, Yun et al (2019) demonstrated the universal approximation property of the transformer; Yun et al (2020) and Zaheer et al (2020) demonstrated that the transformer with sparse attention matrix remains a universal approximator; Shi et al (2021) claimed that the transformer without diag-attention is still a universal approximator.…”
Section: Related Workmentioning
confidence: 99%
“…To be specific, Yun et al (2019) demonstrated the universal approximation property of the transformer; Yun et al (2020) and Zaheer et al (2020) demonstrated that the transformer with sparse attention matrix remains a universal approximator; Shi et al (2021) claimed that the transformer without diag-attention is still a universal approximator. Kratsios et al (2021) proposed that the universal approximation under constraints is possible for the transformer.…”
Section: Related Workmentioning
confidence: 99%
“…To date, there are few -if any -Jackson or Berstein-type results for sequence modelling using the Transformer. We mention a related series of works on static function approximation with a variant of the Transformer architecture [1,48,49]. Here, the targets are continuous functions H : [0, 1] τ → K, and K ⊂ R n is a compact set.…”
Section: Attention-based Architecturesmentioning
confidence: 99%
“…Self-attention-based models such as transformers have also been studied theoretically from different aspects. Many works focused on their approximation capability [16,18,27,56]. Studies had also been done on the Turing completeness [34,53], in-context learning [9,55], and inductive bias [11] of the models.…”
Section: Self-attentionmentioning
confidence: 99%