Atlas: Few-shot Learning with Retrieval Augmented Language Models

Izacard, Gautier; Lewis, Patrick A.; Lomelí, María; Hosseini, Lucas; Petroni, Filippo; Schick, Timo; Jane, Dwivedi-Yu,; Joulin, Armand; Riedel, Sebastian; Grave, Édouard

doi:10.48550/arxiv.2208.03299

Cited by 22 publications

(32 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, besides masked language modeling (MLM) and next sentence prediction (NSP), Lauscher et al (2020) Semi-parametric language models Most of the existing works on semi-parametric language models (Khandelwal et al, 2019;Zhong et al, 2022;Grave et al, 2017;Merity et al, 2017;de Masson d'Autume et al, 2019;Guu et al, 2020;Lewis et al, 2020) mainly focus on improving the language modeling capability (e.g., improving perplexities) or a particular category of downstream task (e.g., open-domain question answering). Some recent works (Izacard et al, 2022;Petroni et al, 2021) seek to improve diverse downstream tasks with an external memory. All these works augment the parametric language model with memories of plain texts.…”

Section: Related Workmentioning

confidence: 99%

“…In addition, large parametric language models are hard to adapt to the evolving world knowledge without expensive model re-training. To overcome these challenges, there has been an increasing interest in developing semi-parametric language models, where a parametric language model is augmented with an external memory containing a large amount of text chunks Izacard et al, 2022;Khandelwal et al, 2019;Zhong et al, 2022). Although these semi-parametric approaches are shown to be more effective than their much larger parametric counterparts, there remain several challenges.…”

Section: Introductionmentioning

confidence: 99%

“…Comparison to state-of-the-art results on the test set of MMLU tasks. Additional models used for comparison: Gopher (Rae et al, 2021), Atlas(Izacard et al, 2022).…”

mentioning

confidence: 99%

See 2 more Smart Citations

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Pan¹,

Yao²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Fully-parametric language models generally require a huge number of model parameters to store the necessary knowledge for solving multiple natural language tasks in zero/few-shot settings. In addition, it is hard to adapt to the evolving world knowledge without the costly model re-training. In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledgerich external memory. Specifically, the external memory contains six different types of knowledge: entity, dictionary, commonsense, event, script, and causality knowledge. For each input instance, the KiC model adaptively selects a knowledge type and retrieves the most helpful pieces of knowledge. The input instance along with its knowledge augmentation is fed into a text-to-text model (e.g., T5) to generate the output answer, where both the input and the output are in natural language forms after prompting. Interestingly, we find that KiC can be identified as a special mixture-of-experts (MoE) model, where the knowledge selector plays the role of a router that is used to determine the sequence-to-expert assignment in MoE. This key observation inspires us to develop a novel algorithm for training KiC with an instance-adaptive knowledge selector. As a knowledge-rich semiparametric language model, KiC only needs a much smaller parametric part to achieve superior zero-shot performance on unseen tasks. By evaluating on 40+ different tasks, we show that KiC Large with 770M parameters easily outperforms large language models (LMs) that are 4-39x larger by a large margin. We also demonstrate that KiC exhibits emergent abilities at a much smaller model scale compared to the fully-parametric models.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Pan¹,

Yao²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For example, in open-domain question answering (Chen et al, 2017), demonstrated by only a few examples of question-answer pairs, LLMs are able to answer arbitrary factoid questions (Joshi et al, 2017;Yang et al, 2018;Kwiatkowski et al, 2019). Recent research (Guu et al, 2020;Lewis et al, 2020;Izacard et al, 2022) shows that retrieval-augmentation can further improve LLMs' performance on knowledge-intensive tasks by conditioning the LLMs on retrieved relevant passages from an external corpus.…”

Section: Introductionmentioning

confidence: 99%

Recitation-Augmented Language Models

Sun¹,

Wang²,

Tay³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose a new paradigm to help Large Language Models (LLMs) generate more accurate factual knowledge without retrieving from an external corpus, called RECITation-augmented gEneration (RECITE). Different from retrievalaugmented language models that retrieve relevant documents before generating the outputs, given an input, RECITE first recites one or several relevant passages from LLMs' own memory via sampling, and then produces the final answers. We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks. Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance in various closed-book question answering (CBQA) tasks. In experiments, we verify the effectiveness of RECITE on three pre-trained models (PaLM, UL2, and OPT) and three CBQA tasks (Natural Questions, TriviaQA, and HotpotQA). Question: who wrote the song i hate you i love you Answer: Gnash … Question: who wrote the school for good and evil Direct Generation (e.g., PaLM) Answer: Soman Chainani LM Input LM Output Question: who wrote the school for good and evil Retrieval-augmented Generation (e.g., Atlas) Answer: Soman Chainani LM Input LM Output The School for Good and Evil is a fantasy fairytale hexalogy of books by Soman Chainani… Retriever Results Question: who wrote the song i hate you i love you Recitation: "I Hate U, I Love U" (stylized in all lowercase) is a song by American singer and rapper Gnash featuring American singer Olivia O'Brien. Answer: Gnash … Question: who wrote the school for good and evil Recitation-augmented Generation (ours) Recitation: The School for Good and Evil was first published on May 14, 2013 by Soman Chainani… Answer: Soman Chainani LM Input LM Output * Work done during internship at Google.

show abstract

“…Increasingly, a middle ground combining the two paradigms and retaining the best of both worlds is becoming popular across various domains, ranging from natural language [Das et al, 2021, Wang et al, 2022, Izacard et al, 2022, to vision [Liu et al, 2015, Iscen et al, 2022, Long et al, 2022, to reinforcement learning [Blundell et al, 2016, Pritzel et al, 2017, Ritter et al, 2020 , to even protein structure predictions [Cramer, 2021] . In such approaches, given a test input, one first retrieves relevant entries from a data index and then processes the retrieved entries along with the test input to make the final predictions using a machine learning model.…”

Section: Introductionmentioning

confidence: 99%

Generalization Properties of Retrieval-based Models

Basu¹,

Rawat²,

Zaheer³

2022

Preprint

View full text Add to dashboard Cite

Many modern high-performing machine learning models such as GPT-3 primarily rely on scaling up models, e.g., transformer networks. Simultaneously, a parallel line of work aims to improve the model performance by augmenting an input instance with other (labeled) instances during inference. Examples of such augmentations include task-specific prompts and similar examples retrieved from the training data by a nonparametric component. Remarkably, retrieval-based methods have enjoyed success on a wide range of problems, ranging from standard natural language processing and vision tasks to protein folding, as demonstrated by many recent efforts, including WebGPT and AlphaFold. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. In this paper, we present a formal treatment of retrieval-based models to characterize their generalization ability.In particular, we focus on two classes of retrieval-based classification approaches: First, we analyze a local learning framework that employs an explicit local empirical risk minimization based on retrieved examples for each input instance. Interestingly, we show that breaking down the underlying learning task into local sub-tasks enables the model to employ a low complexity parametric component to ensure good overall accuracy. The second class of retrieval-based approaches we explore learns a global model using kernel methods to directly map an input instance and retrieved examples to a prediction, without explicitly solving a local learning task.

show abstract

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Cited by 22 publications

References 32 publications

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Recitation-Augmented Language Models

Generalization Properties of Retrieval-based Models

Contact Info

Product

Resources

About