Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-4009
|View full text |Cite
|
Sign up to set email alerts
|

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Abstract: FAIRSEQ is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found here: https://www.youtube. com/watch?v=OtgDdWtHvto.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
1,065
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 1,726 publications
(1,069 citation statements)
references
References 26 publications
2
1,065
0
2
Order By: Relevance
“…There are many libraries that provide high-level APIs to specific applications, such as Facebook's Torchvision [32], Detectron [33], and Fairseq [34]. However, each library has a different API, input 1 https://github.com/pytorch/examples/blob/master/mnist/main.py representation, and requires different assumptions about training details, all of which a user must learn from scratch each time.…”
Section: Consistency Across Domainsmentioning
confidence: 99%
“…There are many libraries that provide high-level APIs to specific applications, such as Facebook's Torchvision [32], Detectron [33], and Fairseq [34]. However, each library has a different API, input 1 https://github.com/pytorch/examples/blob/master/mnist/main.py representation, and requires different assumptions about training details, all of which a user must learn from scratch each time.…”
Section: Consistency Across Domainsmentioning
confidence: 99%
“…In the NLP community, there are several well designed frameworks for research and commercial purposes, including toolkits for providing conventional layered linguistic annotations (Manning et al, 2014), platforms for developing novel neural models and systems for neural machine translation (Ott et al, 2019). However, it is hard to find an existing tool that supports all features in the new paradigm and can be easily customized for new tasks.…”
Section: Introductionmentioning
confidence: 99%
“…To reduce the vocabulary size of varied web document content, we apply byte-pair encoding (Sennrich et al, 2016) to generate 40K codes for each dataset. We implement our models in fairseqpy (Ott et al, 2019) using the Transformer Big architecture and training schedule described in (Vaswani et al, 2017). Detailed parameters are listed in the Appendix.…”
Section: Training and Generationmentioning
confidence: 99%