2022
DOI: 10.48550/arxiv.2208.03299
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Abstract: Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained ret… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
32
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(32 citation statements)
references
References 32 publications
0
32
0
Order By: Relevance
“…For example, besides masked language modeling (MLM) and next sentence prediction (NSP), Lauscher et al (2020) Semi-parametric language models Most of the existing works on semi-parametric language models (Khandelwal et al, 2019;Zhong et al, 2022;Grave et al, 2017;Merity et al, 2017;de Masson d'Autume et al, 2019;Guu et al, 2020;Lewis et al, 2020) mainly focus on improving the language modeling capability (e.g., improving perplexities) or a particular category of downstream task (e.g., open-domain question answering). Some recent works (Izacard et al, 2022;Petroni et al, 2021) seek to improve diverse downstream tasks with an external memory. All these works augment the parametric language model with memories of plain texts.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, besides masked language modeling (MLM) and next sentence prediction (NSP), Lauscher et al (2020) Semi-parametric language models Most of the existing works on semi-parametric language models (Khandelwal et al, 2019;Zhong et al, 2022;Grave et al, 2017;Merity et al, 2017;de Masson d'Autume et al, 2019;Guu et al, 2020;Lewis et al, 2020) mainly focus on improving the language modeling capability (e.g., improving perplexities) or a particular category of downstream task (e.g., open-domain question answering). Some recent works (Izacard et al, 2022;Petroni et al, 2021) seek to improve diverse downstream tasks with an external memory. All these works augment the parametric language model with memories of plain texts.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, large parametric language models are hard to adapt to the evolving world knowledge without expensive model re-training. To overcome these challenges, there has been an increasing interest in developing semi-parametric language models, where a parametric language model is augmented with an external memory containing a large amount of text chunks Izacard et al, 2022;Khandelwal et al, 2019;Zhong et al, 2022). Although these semi-parametric approaches are shown to be more effective than their much larger parametric counterparts, there remain several challenges.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, in open-domain question answering (Chen et al, 2017), demonstrated by only a few examples of question-answer pairs, LLMs are able to answer arbitrary factoid questions (Joshi et al, 2017;Yang et al, 2018;Kwiatkowski et al, 2019). Recent research (Guu et al, 2020;Lewis et al, 2020;Izacard et al, 2022) shows that retrieval-augmentation can further improve LLMs' performance on knowledge-intensive tasks by conditioning the LLMs on retrieved relevant passages from an external corpus.…”
Section: Introductionmentioning
confidence: 99%
“…Increasingly, a middle ground combining the two paradigms and retaining the best of both worlds is becoming popular across various domains, ranging from natural language [Das et al, 2021, Wang et al, 2022, Izacard et al, 2022, to vision [Liu et al, 2015, Iscen et al, 2022, Long et al, 2022, to reinforcement learning [Blundell et al, 2016, Pritzel et al, 2017, Ritter et al, 2020 , to even protein structure predictions [Cramer, 2021] . In such approaches, given a test input, one first retrieves relevant entries from a data index and then processes the retrieved entries along with the test input to make the final predictions using a machine learning model.…”
Section: Introductionmentioning
confidence: 99%