PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Lewis, Patrick; Wu, Yuxiang; Liu, Linqing; Minervini, Pasquale; Küttler, Heinrich; Piktus, Aleksandra; Stenetorp, Pontus; Riedel, Sebastian

doi:10.1162/tacl_a_00415

Cited by 74 publications

(91 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Dataset Augmentation Prior work on QA has performed data augmentation by either creating template-based or machine generated questions, e.g., for visual QA (Kafle et al, 2017) and textual QA (Lewis et al, 2021 generally lack rich linguistic variations. On the other hand, large-scale language models like T5 (Raffel et al, 2020) which are trained on very large data from various web sources can learn general linguistic properties and variations (Brown et al, 2020).…”

Section: Data Annotationmentioning

confidence: 99%

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

Masry¹,

Long²,

Tan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in their questions. However, most existing datasets do not focus on such complex reasoning questions as their questions are template-based and answers come from a fixed-vocabulary. In this work, we present a large-scale benchmark covering 9.6K human-written questions as well as 23.1K questions generated from human-written chart summaries. To address the unique challenges in our benchmark involving visual and logical reasoning over charts, we present two transformer-based models that combine visual features and the data table of the chart in a unified way to answer questions. While our models achieve the state-of-the-art results on the previous datasets as well as on our benchmark, the evaluation also reveals several challenges in answering complex reasoning questions.

show abstract

Section: Data Annotationmentioning

confidence: 99%

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

Masry¹,

Long²,

Tan³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Guu et al (2020) proposed adding a latent knowledge retriever to the pre-training process, which will extend the context with additional knowledge derived from a textual corpus. The latter pre-training procedure is also commonly used to improve the performance of closed-book questionanswering (CBQA) models (Roberts, Raffel, and Shazeer 2020;Lewis et al 2021). CBQA is highly related to the probing considered in this article: both settings require the model to produce the correct answer directly from their parametric memory, without access to outside sources.…”

Section: Learning and Forgettingmentioning

confidence: 99%

“…By considering multiple probe sets (also called as the LAMA probes), they consequently showed that a reasonable amount of knowledge is captured in BERT. As a consequence, factual knowledge stored in the parametric memory of BERT models can be used for knowledge-intensive tasks like question answering and fact checking without the need of additional context (Roberts, Raffel, and Shazeer 2020;Lewis et al 2021).…”

Section: Introductionmentioning

confidence: 99%

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Wallat¹,

Singh²,

Anand³

2021

Preprint

View full text Add to dashboard Cite

Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this article, we probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory. While probing for linguistic understanding is commonly applied to all layers of BERT as well as finetuned models, this has not been done for factual knowledge. We utilize existing knowledge base completion tasks (LAMA) to probe every layer of pre-trained as well as fine-tuned BERT models (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten. The extent of forgetting is impacted by the fine-tuning objective and the training data. We found that ranking models forget the least and retain more knowledge in their final layer compared to masked language modeling and question-answering. However, masked language modeling performed the best at acquiring new knowledge from the training data. When it comes to learning facts, we found that capacity and fact density are key factors. We hope this initial work will spur further research into understanding the parametric memory of language models and the effect of training objectives on factual knowledge. The code to repeat the experiments is publicly available on GitHub 1 .

show abstract

“…We compare two DPR passage encoders: one based on NQ and the other on the PAQ dataset (Lewis et al, 2021b). 8 We expect the question encoder trained on PAQ is more robust because (a) 10M passages are sampled in PAQ, which is arguably more varied than NQ, and (b) all the plausible answer spans are identified using automatic tools.…”

Section: Data Augmentationmentioning

confidence: 99%

“…PAQ dataset sampling Lewis et al (2021b) introduce Probably Asked Questions (PAQ), a large question repository constructed using a question generation model on Wikipedia passages. We group all of the questions asked about a particular passage and filter out any passages that have less than 3 generated questions.…”

Section: B Experimental Detailsmentioning

confidence: 99%

Simple Entity-Centric Questions Challenge Dense Retrievers

Sciavolino¹,

Zhong²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples. However, in this paper, we demonstrate current dense models are not yet the holy grail of retrieval. We first construct EntityQuestions, a set of simple, entityrich questions based on facts from Wikidata (e.g., "Where was Arve Furset born?"), and observe that dense retrievers drastically underperform sparse methods. We investigate this issue and uncover that dense retrievers can only generalize to common entities unless the question pattern is explicitly observed during training. We discuss two simple solutions towards addressing this critical problem. First, we demonstrate that data augmentation is unable to fix the generalization problem. Second, we argue a more robust passage encoder helps facilitate better question adaptation using specialized question encoders. We hope our work can shed light on the challenges in creating a robust, universal dense retriever that works well across different input distributions. 1 * The first two authors contributed equally. 1 Our dataset and code are publicly available at https:// github.com/princeton-nlp/EntityQuestions.

show abstract

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Cited by 74 publications

References 31 publications

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Simple Entity-Centric Questions Challenge Dense Retrievers

Contact Info

Product

Resources

About