2021
DOI: 10.48550/arxiv.2102.07033
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Abstract: Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is cove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(21 citation statements)
references
References 33 publications
0
21
0
Order By: Relevance
“…ImageNet-21k [15] 14.2M 0 21K Kinetics-700 [27] 0 542K 700 Moments in Time [49] 0 792K 339 Books&Wiki [79] 0 0 101M PAQ [35] 0 0 65M CC3M [60] 3.0M 0 3.0M CC12M [8] 11.1M 0 11.1M COCO Caption [11] 113K 0 567K Visual Genome [30] 108K 0 5.41M SBU [50] 830K 0 830K YFCC * [26] 14.8M 0 14.8M English Wikipedia (Books&Wiki) and PAQ [35]. For language modeling with image clues and image-text retrieval, we use a combination of COCO Caption [12], SBU Captions (SBU) [50], Visual Genome [30], CC3M [60], CC12M [8] and YFCC [26].…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…ImageNet-21k [15] 14.2M 0 21K Kinetics-700 [27] 0 542K 700 Moments in Time [49] 0 792K 339 Books&Wiki [79] 0 0 101M PAQ [35] 0 0 65M CC3M [60] 3.0M 0 3.0M CC12M [8] 11.1M 0 11.1M COCO Caption [11] 113K 0 567K Visual Genome [30] 108K 0 5.41M SBU [50] 830K 0 830K YFCC * [26] 14.8M 0 14.8M English Wikipedia (Books&Wiki) and PAQ [35]. For language modeling with image clues and image-text retrieval, we use a combination of COCO Caption [12], SBU Captions (SBU) [50], Visual Genome [30], CC3M [60], CC12M [8] and YFCC [26].…”
Section: Datasetsmentioning
confidence: 99%
“…We also add <SPE> tokens at the beginning of the sequences x and y, of which output features are used to computed joint probability. For retrieval tasks like image-text retrieval, we use train- [8] 128 0.02778 CC3M [60] 128 0.01389 Visual Genome [30] 128 0.01389 COCO Caption [11] 128 0.01389 SBU [50] 128 0.01389 PAQ [35] 512 0.0222 Table 10. Ingredients and hyper-parameters for our pre-training.…”
Section: Formulation Of Novel Tasksmentioning
confidence: 99%
“…3 For example, the IMDB and SST-2 models, which are tasked with classifying the sentiment of movie reviews, are paired with a corpus of unlabeled Amazon product reviews (Ni et al, 2019). TREC, a question classification task, is paired with PAQ (Lewis et al, 2021), a collection of 65 million questions. AGNews, a news classification task, is paired with CC-News corpus (Nagel, 2016).…”
Section: Implementation Detailsmentioning
confidence: 99%
“…More standard data augmentation techniques where the synthetic data bears no instance-level relation to the original data has shown only weak improvements to robustness and out-of-domain generalization (Bartolo et al, 2021;Lewis et al, 2021). In this work, we analyze the effectiveness of CDA against such augmentation techniques.…”
Section: Data Augmentationmentioning
confidence: 99%
“…For instance, there are a significant number of articles about sports teams, books, songs etc. To ensure that the random sampling of Wikipedia paragraphs has a similar distribution, we employ the learned passage selection model from Lewis et al (2021) 1 , which is the basis of closely related work on data augmentation (non-counterfactual) for the SQuAD reading comprehension dataset (Bartolo et al, 2021).…”
Section: Baselinesmentioning
confidence: 99%