fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Ott, Myle; Edunov, Sergey; Baevski, Alexei; Fan, Angela; Gross, Sam; Ng, Nathan; Grangier, David; Auli, Michael

doi:10.18653/v1/n19-4009

Cited by 1,726 publications

(1,069 citation statements)

References 26 publications

Supporting

Mentioning

1,065

Contrasting

Unclassified

Order By: Relevance

“…There are many libraries that provide high-level APIs to specific applications, such as Facebook's Torchvision [32], Detectron [33], and Fairseq [34]. However, each library has a different API, input 1 https://github.com/pytorch/examples/blob/master/mnist/main.py representation, and requires different assumptions about training details, all of which a user must learn from scratch each time.…”

Section: Consistency Across Domainsmentioning

confidence: 99%

Fastai: A Layered API for Deep Learning

2020

View full text Add to dashboard Cite

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes: a new type dispatch system for Python along with a semantic type hierarchy for tensors; a GPU-optimized computer vision library which can be extended in pure Python; an optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4-5 lines of code; a novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training; a new data block API; and much more. We have used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching.

show abstract

Section: Consistency Across Domainsmentioning

confidence: 99%

Fastai: A Layered API for Deep Learning

2020

View full text Add to dashboard Cite

show abstract

“…In the NLP community, there are several well designed frameworks for research and commercial purposes, including toolkits for providing conventional layered linguistic annotations (Manning et al, 2014), platforms for developing novel neural models and systems for neural machine translation (Ott et al, 2019). However, it is hard to find an existing tool that supports all features in the new paradigm and can be easily customized for new tasks.…”

Section: Introductionmentioning

confidence: 99%

Multi-Task Deep Neural Networks for Natural Language Understanding

Liu¹,

He²,

Chen³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

844

713

View full text Add to dashboard Cite

We present MT-DNN 1 , an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. To enable efficient production deployment, MT-DNN supports multitask knowledge distillation, which can substantially compress a deep neural model without significant performance drop. We demonstrate the effectiveness of MT-DNN on a wide range of NLU applications across general and biomedical domains. The software and pretrained models will be publicly available at https://github.com/namisan/mt-dnn.

show abstract

“…To reduce the vocabulary size of varied web document content, we apply byte-pair encoding (Sennrich et al, 2016) to generate 40K codes for each dataset. We implement our models in fairseqpy (Ott et al, 2019) using the Transformer Big architecture and training schedule described in (Vaswani et al, 2017). Detailed parameters are listed in the Appendix.…”

Section: Training and Generationmentioning

confidence: 99%

Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Fan¹,

Gardent²,

Braud³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

View full text Add to dashboard Cite

Query-based open-domain NLP tasks require information synthesis from long and diverse web results.Current approaches extractively select portions of web text as input to Sequence-to-Sequence models using methods such as TF-IDF ranking. We propose constructing a local graph structured knowledge base for each query, which compresses the web search information and reduces redundancy. We show that by linearizing the graph into a structured input sequence, models can encode the graph representations within a standard Sequence-to-Sequence setting. For two generative tasks with very long text input, long-form question answering and multidocument summarization, feeding graph representations as input can achieve better performance than using retrieved text portions. Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading wikipedia to answer opendomain questions. In ACL. Danqi Chen, Richard Socher, Christopher D Manning, and Andrew Y Ng. 2013. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. arXiv preprint arXiv:1301.3618. Christopher Clark and Matt Gardner. 2017. Simple and effective multi-paragraph reading comprehension. arXiv preprint arXiv:1710.10723. Kevin Clark and Christopher D Manning. 2016a. Deep reinforcement learning for mention-ranking coreference models. arXiv preprint arXiv:1609.08667. Kevin Clark and Christopher D Manning. 2016b. Improving coreference resolution by learning entitylevel distributed representations. arXiv preprint arXiv:1606.01323. . 2017. Kbqa: learning question answering over qa corpora and knowledge bases. Proceedings of the VLDB Endowment, 10(5):565-576. . 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.

show abstract

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Cited by 1,726 publications

References 26 publications

Fastai: A Layered API for Deep Learning

Fastai: A Layered API for Deep Learning

Multi-Task Deep Neural Networks for Natural Language Understanding

Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Contact Info

Product

Resources

About