Inductive Biases and Variable Creation in Self-Attention Mechanisms

Edelman, Benjamin; Goel, Surbhi; Kakade, Sham M.; Zhang, Cyril

doi:10.48550/arxiv.2110.10090

Cited by 1 publication

(3 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work focuses on the analysis of Maximum Likelihood Estimate (MLE) with transformer function class, which is not covered by previous works. Our bounds are sharper than that ofEdelman et al (2021) on the channel number dependency.…”

mentioning

confidence: 57%

“…Following this line,Liao et al (2020) ,Ledent et al (2021) andLin and Zhang (2019) built the generalization bound for graph neural networks and convolutional neural network. These results respected the underlying graph structure and the translation-invariance in the networks Edelman et al (2021). established the generalization bound for transformer, but this result did not reflect the permutation-invariance, still depending on the channel number.…”

mentioning

confidence: 84%

“…Wei et al (2022a) presented the approximation and generalization bounds for learning boolean circuits and Turing machines with transformers. Edelman et al (2021) and Li et al (2023) derived the generalization error bound of transformers. In our work, we analyze transformers from both the analytic and statistical sides.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Learning the Structure of Hub Network Based on Graph Model

2019

View full text Add to dashboard Cite

In this paper, we focus on the structure learning problem of the hub network. In the neighborhood selection framework, we use the L1 and L2 regularizers to incorporate the sparse and group prior of the hub network, so as to make the network easier to generate Hub. We employ the coordinate descent algorithm to solve the resulting model. Simulation and real data analysis show that the proposed method is effective and applicable in parameter estimation and model selection, and results illustrate the influence ability of the control parameter on the model.

show abstract

mentioning

confidence: 57%

mentioning

confidence: 84%