Big Bird: Transformers for Longer Sequences

Zaheer, Manzil; Guruganesh, Guru; Dubey, Avinava; Ainslie, Joshua; Alberti, Chris; Ontañón, Santiago; Pham, Philip; Ravula, Anirudh; Wang, Qifan; Yang, Li; Ahmed, Amr

doi:10.48550/arxiv.2007.14062

Cited by 97 publications

(161 citation statements)

References 71 publications

(137 reference statements)

Supporting

Mentioning

154

Contrasting

Order By: Relevance

“…Yet, as dimension increases, sequence lengths will reach the practical limits of quadratic attention mechanisms. Experimenting with transformers with linear or log-linear attention (Zaheer et al, 2021;Wang et al, 2020a;Vyas et al, 2020;) is a natural extension of our work. In terms of asymptotic complexity, matrix inversion (and the other non linear tasks) is usually handled by O(n 3 ) algorithms (although O(n 2.37 ) methods are known).…”

Section: Out-of-domain Generalization and Retrainingmentioning

confidence: 85%

Linear algebra with transformers

Charton¹

2021

Preprint

View full text Add to dashboard Cite

Most applications of transformers to mathematics, from integration to theorem proving, focus on symbolic computation. In this paper, we show that transformers can be trained to perform numerical calculations with high accuracy. We consider problems of linear algebra: matrix transposition, addition, multiplication, eigenvalues and vectors, singular value decomposition, and inversion. Training small transformers (up to six layers) over datasets of random matrices, we achieve high accuracies (over 90%) on all problems. We also show that trained models can generalize out of their training distribution, and that out-of-domain accuracy can be greatly improved by working from more diverse datasets (in particular, by training from matrices with non-independent and identically distributed coefficients). Finally, we show that few-shot learning can be leveraged to re-train models to solve larger problems.

show abstract

Section: Out-of-domain Generalization and Retrainingmentioning

confidence: 85%

Linear algebra with transformers

Charton¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Moreover, HiRID introduces a novel high-resolution aspect in ICU data, that needs to be correctly taken into account. Thus, as for other sequence data, one possible explanation could be that when trained with extremely long sequences, models can not use the extracted features in the most effective way [46]. In the case of Transformers, to force the model to learn and extract useful patterns, various kinds of improvements could be made [40].…”

Section: Discussionmentioning

confidence: 99%

HiRID-ICU-Benchmark -- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data

Yèche¹,

Kuznetsova²,

Zimmermann³

et al. 2021

Preprint

View full text Add to dashboard Cite

The recent success of machine learning methods applied to time series collected from Intensive Care Units (ICU) exposes the lack of standardized machine learning benchmarks for developing and comparing such methods. While raw datasets, such as MIMIC-IV or eICU, can be freely accessed on Physionet, the choice of tasks and pre-processing is often chosen ad-hoc for each publication, limiting comparability across publications. In this work, we aim to improve this situation by providing a benchmark covering a large spectrum of ICU-related tasks. Using the HiRID dataset, we define multiple clinically relevant tasks developed in collaboration with clinicians. In addition, we provide a reproducible end-to-end pipeline to construct both data and labels. Finally, we provide an in-depth analysis of current state-of-the-art sequence modeling methods, highlighting some limitations of deep learning approaches for this type of data. With this benchmark, we hope to give the research community the possibility of a fair comparison of their work.

show abstract

“…For that reason, Google's BigBird model is selected in this study, which is one of the most successful long-sequence transformers that supports sequence length of 4000 tokens. To deal with the limitations that other models face, BigBird uses a sparse attention mechanism that reduces the quadratic dependency to linear [55]. That means that it can handle sequences of length up to 8x of what was previously possible using similar hardware.…”

Section: -Way Text Entailmentmentioning

confidence: 99%

Logically at Factify 2022: Multimodal Fact Verification

Gao¹,

Hoffmann²,

Oikonomou³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022. Despite the recent advance in text based verification techniques and large pre-trained multimodal models cross vision and language, very limited work has been done in applying multimodal techniques to automate fact checking process, particularly considering the increasing prevalence of claims and fake news about images and videos on social media. In our work, the challenge is treated as multimodal entailment task and framed as multi-class classification. Two baseline approaches are proposed and explored including an ensemble model (combining two uni-modal models) and a multimodal attention network (modeling the interaction between image and text pair from claim and evidence document). We conduct several experiments investigating and benchmarking different SoTA pre-trained transformers and vision models in this work. Our best model is ranked first in leaderboard which obtains a weighted average F-measure of 0.77 on both validation and test set. Exploratory analysis of dataset is also carried out on the Factify data set and uncovers salient patterns and issues (e.g., word overlapping, visual entailment correlation, source bias) that motivates our hypothesis. Finally, we highlight challenges of the task and multimodal dataset for future research.

show abstract

Big Bird: Transformers for Longer Sequences

Cited by 97 publications

References 71 publications

Linear algebra with transformers

Linear algebra with transformers

HiRID-ICU-Benchmark -- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data

Logically at Factify 2022: Multimodal Fact Verification

Contact Info

Product

Resources

About