Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Kossen, Jannik; Band, Neil; Lyle, Clare; Gomez, Aidan N.; Rainforth, Tom; Gal, Yarin

doi:10.48550/arxiv.2106.02584

Cited by 4 publications

(10 citation statements)

References 47 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Deep learning for tabular data As described by Borisov et al [2021] in their review of the field, there have been various attempts to make deep learning work on tabular data: data encoding techniques to make tabular data better suited for deep learning [Hancock andKhoshgoftaar, 2020, Yoon et al, 2020], "hybrid methods" to benefit from the flexibility of NNs while keeping the inductive biases of other algorithms like tree-based models [Lay et al, 2018, Popov et al, 2020, Abutbul et al, 2020, Hehn et al, 2019, Tanno et al, 2019, Chen, 2020, Kontschieder et al, 2015, Rodriguez et al, 2019, Popov et al, 2020, Lay et al, 2018 or Factorization Machines Guo et al [2017], tabularspecific transformers architectures Somepalli et al [2021], Kossen et al [2021], Arik and Pfister [2019], Huang et al [2020], and various regularization techniques to adapt classical architectures to tabular data [Lounici et al, 2021, Shavitt and Segal, 2018, Kadra et al, 2021a, Fiedler, 2021. In this paper, we focus on architectures directly inspired by classic deep learning models, in particular Transformers and Multi-Layer-Perceptrons (MLPs).…”

Section: Related Workmentioning

confidence: 99%

“…Deep learning has enabled tremendous progress for learning on image, language, or even audio datasets. On tabular data, however, the picture is muddier and ensemble models based on decision trees like XGBoost remain the go-to tool for most practitioners [Sta] and data science competitions [Kossen et al, 2021]. Indeed deep learning architectures have been crafted to create inductive biases matching invariances and spatial dependencies of the data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Why do tree-based models still outperform deep learning on tabular data?

Grinsztajn¹,

Oyallon²,

Varoquaux³

2022

Preprint

106

View full text Add to dashboard Cite

While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations. We define a standard set of 45 datasets from varied domains with clear characteristics of tabular data and a benchmarking methodology accounting for both fitting models and finding good hyperparameters. Results show that treebased models remain state-of-the-art on medium-sized data (∼10K samples) even without accounting for their superior speed. To understand this gap, we conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs). This leads to a series of challenges which should guide researchers aiming to build tabular-specific NNs: 1. be robust to uninformative features, 2. preserve the orientation of the data, and 3. be able to easily learn irregular functions. To stimulate research on tabular architectures, we contribute a standard benchmark and raw data for baselines: every point of a 20 000 compute hours hyperparameter search for each learner.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Why do tree-based models still outperform deep learning on tabular data?

Grinsztajn¹,

Oyallon²,

Varoquaux³

2022

Preprint

106

View full text Add to dashboard Cite

show abstract

“…We also augment these baselines with zero-shot predictions obtained with the same model used to extract the protein sequence embeddings. Lastly, we include ProteinNPT [Notin et al, 2023], a semi-supervised pseudo-generative architecture which jointly models sequences and labels by performing axial attention [Ho et al, 2019b, Kossen et al, 2022…”

Section: Supervised Benchmarksmentioning

confidence: 99%

“…ProteinNPT [Notin et al, 2023] is a semi-supervised non-parametric transformer [Kossen et al, 2022] which learns a joint representation of full batches of labeled sequences. It is trained with a hybrid objective consisting of fitness prediction and masked amino acids reconstruction.…”

Section: A Appendixmentioning

confidence: 99%

ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction

Notin,

Kollasch,

Ritter

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.

show abstract

“…While this works reasonably well in practice, critical information may be lost in the pooling operation and, since not all residues may be relevant to a given task, we may want to be selective about which ones to consider. In this work, we introduce ProteinNPT ( § 3), a non-parametric transformer [Kossen et al, 2022] variant which is ideally suited to label-scarce settings through an additional regularizing denoising objective, straightforwardly extends to multi-task optimization settings and addresses all aforementioned issues. In order to quantify the ability of different models to extrapolate to unseen sequence positions, we devise several cross-validation schemes ( § 4.1) which we apply to all Deep Mutational Scanning (DMS) assays in the ProteinGym benchmarks [Notin et al, 2022a].…”

Section: Introductionmentioning

confidence: 99%

ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers

Notin,

Weitzman,

Marks

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. How-ever, computational methods for protein engineering are confronted with significant challenges, such as an expansive design space, sparse functional regions, and a scarcity of available labels. These issues are further exacerbated in practice by the fact most real-life design scenarios necessitate the simultaneous optimization of multiple properties. In this work, we introduce ProteinNPT, a non-parametric trans-former variant tailored to protein sequences and particularly suited to label-scarce and multi-task learning settings. We first focus on the supervised fitness prediction setting and develop several cross-validation schemes which support robust perfor-mance assessment. We subsequently reimplement prior top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of the protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design across extensive in silico Bayesian optimization and conditional sampling experiments.

show abstract

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Cited by 4 publications

References 47 publications

Why do tree-based models still outperform deep learning on tabular data?

Why do tree-based models still outperform deep learning on tabular data?

ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction

ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers

Contact Info

Product

Resources

About