2021
DOI: 10.1162/tacl_a_00415
|View full text |Cite
|
Sign up to set email alerts
|

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Abstract: Open-domain Question Answering models that directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared with conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models fall short of the accuracy of retrieve-and-read systems, as substantially less knowled… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
60
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 74 publications
(91 citation statements)
references
References 31 publications
0
60
0
Order By: Relevance
“…• Dataset Augmentation Prior work on QA has performed data augmentation by either creating template-based or machine generated questions, e.g., for visual QA (Kafle et al, 2017) and textual QA (Lewis et al, 2021 generally lack rich linguistic variations. On the other hand, large-scale language models like T5 (Raffel et al, 2020) which are trained on very large data from various web sources can learn general linguistic properties and variations (Brown et al, 2020).…”
Section: Data Annotationmentioning
confidence: 99%
“…• Dataset Augmentation Prior work on QA has performed data augmentation by either creating template-based or machine generated questions, e.g., for visual QA (Kafle et al, 2017) and textual QA (Lewis et al, 2021 generally lack rich linguistic variations. On the other hand, large-scale language models like T5 (Raffel et al, 2020) which are trained on very large data from various web sources can learn general linguistic properties and variations (Brown et al, 2020).…”
Section: Data Annotationmentioning
confidence: 99%
“…Guu et al (2020) proposed adding a latent knowledge retriever to the pre-training process, which will extend the context with additional knowledge derived from a textual corpus. The latter pre-training procedure is also commonly used to improve the performance of closed-book questionanswering (CBQA) models (Roberts, Raffel, and Shazeer 2020;Lewis et al 2021). CBQA is highly related to the probing considered in this article: both settings require the model to produce the correct answer directly from their parametric memory, without access to outside sources.…”
Section: Learning and Forgettingmentioning
confidence: 99%
“…By considering multiple probe sets (also called as the LAMA probes), they consequently showed that a reasonable amount of knowledge is captured in BERT. As a consequence, factual knowledge stored in the parametric memory of BERT models can be used for knowledge-intensive tasks like question answering and fact checking without the need of additional context (Roberts, Raffel, and Shazeer 2020;Lewis et al 2021).…”
Section: Introductionmentioning
confidence: 99%
“…We compare two DPR passage encoders: one based on NQ and the other on the PAQ dataset (Lewis et al, 2021b). 8 We expect the question encoder trained on PAQ is more robust because (a) 10M passages are sampled in PAQ, which is arguably more varied than NQ, and (b) all the plausible answer spans are identified using automatic tools.…”
Section: Data Augmentationmentioning
confidence: 99%
“…PAQ dataset sampling Lewis et al (2021b) introduce Probably Asked Questions (PAQ), a large question repository constructed using a question generation model on Wikipedia passages. We group all of the questions asked about a particular passage and filter out any passages that have less than 3 generated questions.…”
Section: B Experimental Detailsmentioning
confidence: 99%