Improving language models by retrieving from trillions of tokens

Borgeaud, Sebastian; Arthur, Michel; Hoffmann, Jordan; Cai, Trevor; Rutherford, Eliza; Millican, Katie; Driessche, George van den; Lespiau, Jean-Baptiste; Damoc, Bogdan; Clark, Aidan; Casas, Diego de Las; Guy, Aurelia; Menick, Jacob; Ring, Roman; Hennigan, Tom; Huang, Saffron; Maggiore, Loren; Jones, Chris; Cassirer, Albin; Brock, Andy; Paganini, M.; Irving, Geoffrey; Vinyals, Oriol; Osindero, Simon; Simonyan, Karen; Rae, Jack W.; Elsen, Erich; Sifre, Laurent

doi:10.48550/arxiv.2112.04426

Cited by 50 publications

(87 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent work in this direction has expanded and elaborated on neural models' ability to retrieve and rank passages [40]. The RETRO architecture demonstrates that language models can be primed with results retrieved from a database as large as two trillion tokens [41]. At a broad level, our approach is also comparable to that of Byrne et al [42], which fine-tunes the model to use external APIs for movie ticketing dialog.…”

Section: Related Workmentioning

confidence: 90%

LaMDA: Language Models for Dialog Applications

Thoppilan¹,

Freitas²,

Hall³

et al. 2022

Preprint

186

270

View full text Add to dashboard Cite

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformerbased neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency. * Work done while at Google.

show abstract

Section: Related Workmentioning

confidence: 90%

LaMDA: Language Models for Dialog Applications

Thoppilan¹,

Freitas²,

Hall³

et al. 2022

Preprint

186

270

View full text Add to dashboard Cite

show abstract

“…Granularity of Retrieval While Khandelwal et al (2020) and Yogatama et al (2021) retrieve a token at a time step, other work retrieved a sentence (Hashimoto et al, 2018;Zhang et al, 2018;Gu et al, 2018), a prototype (Guu et al, 2018;He et al, 2020), or a chunk (Guu et al, 2020;Borgeaud et al, 2021). RETOMATON implicitly generalizes these approaches by dynamically constructing the retrieved sequence, essentially being able to retrieve individual tokens as well as constructing search-free longer passages.…”

Section: Retrieval and Neuro-symbolic Methodsmentioning

confidence: 99%

“…In our experiments in Section 5, we reported perplexity compared to FoSS (the fraction of saved searches). The other alternative of measuring wall-clock time is difficult to reproduce, is brittle to temporary hardware and system load, and affected by the specific kNN retrieval library such as FAISS as used in Khandelwal et al (2020), ScaNN (Guo et al, 2020) as used in Borgeaud et al (2021), or SPTAG (Chen et al, 2018), etc. Further, it depends on factors that are orthogonal to our contribution, such as whether the RAM is large enough to store the datastore, and the random-access reading latency of the hard-drive.…”

Section: Fraction Of Saved Searches (Foss) Vs Wall-clock Saved Timementioning

confidence: 99%

“…We expect that as the datastore size increases to the scales of Borgeaud et al (2021), and as the number of neighbors retrieved increases (k neigh ) -the more pressure that will be put on the kNN search, the more of a bottleneck that it will become, and the larger relative benefit that saving kNN searches will provide to wall-clock time.…”

Section: Fraction Of Saved Searches (Foss) Vs Wall-clock Saved Timementioning

confidence: 99%

“…In these models, the retrieval component first searches for nearest neighbor examples in an external datastore; then, the base model references these examples during the prediction. This fusion of language models (LMs) and retrieval improves the base language model from several perspectives, including higher accuracy (Xu et al, 2021), domain adaptability (Jiang et al, 2021), and reduced size (Borgeaud et al, 2021). Further, the retrieved examples provide information regarding the provenance of the model's predictions and allow for modifying the dataset without retraining the model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

Alon¹,

Xu²,

He³

et al. 2022

Preprint

View full text Add to dashboard Cite

Retrieval-based language models (R-LM) model the probability of natural language text by combining a standard language model (LM) with examples retrieved from an external datastore at test time. While effective, a major bottleneck of using these models in practice is the computationally costly datastore search, which can be performed as frequently as every time step. In this paper, we present RETOMATON -retrieval automatonwhich approximates the datastore search, based on (1) clustering of entries into "states", and (2) state transitions from previous entries. This effectively results in a weighted finite automaton built on top of the datastore, instead of representing the datastore as a flat list. The creation of the automaton is unsupervised, and a RETOMATON can be constructed from any text collection: either the original training corpus or from another domain. Traversing this automaton at inference time, in parallel to the LM inference, reduces its perplexity, or alternatively saves up to 83% of the nearest neighbor searches over kNN-LM (Khandelwal et al., 2020), without hurting perplexity.

show abstract

ChatClimate: Grounding conversational AI in climate science

Vaghefi,

Stammbach,

Muccione

et al. 2023

Commun Earth Environ

View full text Add to dashboard Cite

Large Language Models have made remarkable progress in question-answering tasks, but challenges like hallucination and outdated information persist. These issues are especially critical in domains like climate change, where timely access to reliable information is vital. One solution is granting these models access to external, scientifically accurate sources to enhance their knowledge and reliability. Here, we enhance GPT-4 by providing access to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR6), the most comprehensive, up-to-date, and reliable source in this domain (refer to the ’Data Availability’ section). We present our conversational AI prototype, available at www.chatclimate.ai, and demonstrate its ability to answer challenging questions in three different setups: (1) GPT-4, (2) ChatClimate, which relies exclusively on IPCC AR6 reports, and (3) Hybrid ChatClimate, which utilizes IPCC AR6 reports with in-house GPT-4 knowledge. The evaluation of answers by experts show that the hybrid ChatClimate AI assistant provide more accurate responses, highlighting the effectiveness of our solution.

show abstract

Improving language models by retrieving from trillions of tokens

Cited by 50 publications

References 17 publications

LaMDA: Language Models for Dialog Applications

LaMDA: Language Models for Dialog Applications

Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

ChatClimate: Grounding conversational AI in climate science

Contact Info

Product

Resources

About