2021
DOI: 10.48550/arxiv.2112.04426
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving language models by retrieving from trillions of tokens

Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (R ) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, R performance translates to downstream knowledge-intensive tasks such as question answering. R combines a frozen B retriever, a differentiable encoder and a chu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
84
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 50 publications
(87 citation statements)
references
References 17 publications
1
84
0
Order By: Relevance
“…Recent work in this direction has expanded and elaborated on neural models' ability to retrieve and rank passages [40]. The RETRO architecture demonstrates that language models can be primed with results retrieved from a database as large as two trillion tokens [41]. At a broad level, our approach is also comparable to that of Byrne et al [42], which fine-tunes the model to use external APIs for movie ticketing dialog.…”
Section: Related Workmentioning
confidence: 90%
“…Recent work in this direction has expanded and elaborated on neural models' ability to retrieve and rank passages [40]. The RETRO architecture demonstrates that language models can be primed with results retrieved from a database as large as two trillion tokens [41]. At a broad level, our approach is also comparable to that of Byrne et al [42], which fine-tunes the model to use external APIs for movie ticketing dialog.…”
Section: Related Workmentioning
confidence: 90%
“…Granularity of Retrieval While Khandelwal et al (2020) and Yogatama et al (2021) retrieve a token at a time step, other work retrieved a sentence (Hashimoto et al, 2018;Zhang et al, 2018;Gu et al, 2018), a prototype (Guu et al, 2018;He et al, 2020), or a chunk (Guu et al, 2020;Borgeaud et al, 2021). RETOMATON implicitly generalizes these approaches by dynamically constructing the retrieved sequence, essentially being able to retrieve individual tokens as well as constructing search-free longer passages.…”
Section: Retrieval and Neuro-symbolic Methodsmentioning
confidence: 99%
“…In our experiments in Section 5, we reported perplexity compared to FoSS (the fraction of saved searches). The other alternative of measuring wall-clock time is difficult to reproduce, is brittle to temporary hardware and system load, and affected by the specific kNN retrieval library such as FAISS as used in Khandelwal et al (2020), ScaNN (Guo et al, 2020) as used in Borgeaud et al (2021), or SPTAG (Chen et al, 2018), etc. Further, it depends on factors that are orthogonal to our contribution, such as whether the RAM is large enough to store the datastore, and the random-access reading latency of the hard-drive.…”
Section: Fraction Of Saved Searches (Foss) Vs Wall-clock Saved Timementioning
confidence: 99%
See 2 more Smart Citations