Xingyou Song scite author profile

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attentionkernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), which may be of independent interest for scalable kernel methods. FAVOR+ can be also used to efficiently model kernelizable attention mechanisms beyond softmax. This representational power is crucial to accurately compare softmax with other kernels for the first time on large-scale tasks, beyond the reach of regular Transformers, and investigate optimal attention-kernels. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. We tested Performers on a rich set of tasks stretching from pixel-prediction through text models to protein sequence modeling. We demonstrate competitive results with other examined efficient sparse and dense attention methods, showcasing effectiveness of the novel attention-learning paradigm leveraged by Performers.

show abstract

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Song

Yang

Choromański

et al. 2020

View full text Add to dashboard Cite

Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation operator and combine it with meta-learning based on evolutionary strategies. Our method significantly improves adaptation to changes in dynamics in high noise settings, which are common in robotics applications. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics. We observe that our method significantly outperforms prior gradient-based approaches, enabling the robot to adapt its policy to changes based on less than 3 minutes of real data.

show abstract

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

Choromański¹,

Likhosherstov²,

Dohan³

et al. 2020

Preprint

View full text Add to dashboard Cite

Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies between distant inputs continues to grow. In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed. However, real-world applications that involve long sequences, such as biological sequence analysis, may fall short of meeting these assumptions, precluding exploration of these models. To address this challenge, we present a new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR). Our mechanism scales linearly rather than quadratically in the number of tokens in the sequence, is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors. Furthermore, it provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence. It is also backwards-compatible with pre-trained regular Transformers. We demonstrate its effectiveness on the challenging task of protein sequence modeling and provide detailed theoretical analysis.

show abstract

Synthesis, Structure, and Reactivity of (C₅H₄SiMe₃)₂Y{(μ-FC₆F₄)(μ-Me)B(C₆F₅)₂}: Tight Ion Pairing in a Cationic Lanthanide Complex

Song

Bochmann

1998

Organometallics

View full text Add to dashboard Cite

show abstract

Chalcogenolato complexes of bismuth and antimony. Syntheses, thermolysis reactions, and crystal structure of Sb(SC6H2Pri 3-2,4,6)3

Bochmann¹,

Song²,

Hursthouse³

et al. 1995

J. Chem. Soc., Dalton Trans.

View full text Add to dashboard Cite

Antimony( 111) and bismuth (111) complexes of sterically demanding arenechalcogenolato ligands, M(EC,H,R',-2.4,6), (E = S or Se; M = Sb or Bi; R' = Me, Pri or But) have been prepared by either protolysis of the amides M [N(SiMe,),], with arenechalcogenols, or from MCI, by halide exchange (M = Bi or Sb). The complexes are monomeric in the solid state and sublime readily. The crystal structure of Sb( SC,H2Prl,-2,4,6), has been determined by X-ray diffraction. The compound possesses a trigonal-pyramidal geometry, with Sb-S distances of 2.41 8(2)-2.438(2) A and S-Sb-S angles of 94.69(7)-98.29(8)". Preliminary X-ray results on Bi(SeC6H2Pra,-2,4,6), showed that the compounds of Sb and Bi are isostructural. Thermolytic decomposition of some of the compounds has been carried out in the solid state. Compounds with R' = Me or Pri undergo reductive elimination to give elemental bismuth or antimony, whereas the bulky selenolates M (SeC6H,But,-2,4,6), afford M,Se,.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xingyou Song

Rethinking Attention with Performers

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

Synthesis, Structure, and Reactivity of (C₅H₄SiMe₃)₂Y{(μ-FC₆F₄)(μ-Me)B(C₆F₅)₂}: Tight Ion Pairing in a Cationic Lanthanide Complex

Chalcogenolato complexes of bismuth and antimony. Syntheses, thermolysis reactions, and crystal structure of Sb(SC6H2Pri 3-2,4,6)3

Contact Info

Product

Resources

About

Xingyou Song

Rethinking Attention with Performers

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

Synthesis, Structure, and Reactivity of (C5H4SiMe3)2Y{(μ-FC6F4)(μ-Me)B(C6F5)2}: Tight Ion Pairing in a Cationic Lanthanide Complex

Chalcogenolato complexes of bismuth and antimony. Syntheses, thermolysis reactions, and crystal structure of Sb(SC6H2Pri 3-2,4,6)3

Contact Info

Product

Resources

About

Synthesis, Structure, and Reactivity of (C₅H₄SiMe₃)₂Y{(μ-FC₆F₄)(μ-Me)B(C₆F₅)₂}: Tight Ion Pairing in a Cationic Lanthanide Complex