PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Lewis, Patrick A.; Wu, Yue; Liu, Linqing; Minervini, Pasquale; Küttler, Heinrich; Piktus, Aleksandra; Stenetorp, Pontus; Riedel, Sebastian

doi:10.48550/arxiv.2102.07033

Cited by 12 publications

(21 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ImageNet-21k [15] 14.2M 0 21K Kinetics-700 [27] 0 542K 700 Moments in Time [49] 0 792K 339 Books&Wiki [79] 0 0 101M PAQ [35] 0 0 65M CC3M [60] 3.0M 0 3.0M CC12M [8] 11.1M 0 11.1M COCO Caption [11] 113K 0 567K Visual Genome [30] 108K 0 5.41M SBU [50] 830K 0 830K YFCC * [26] 14.8M 0 14.8M English Wikipedia (Books&Wiki) and PAQ [35]. For language modeling with image clues and image-text retrieval, we use a combination of COCO Caption [12], SBU Captions (SBU) [50], Visual Genome [30], CC3M [60], CC12M [8] and YFCC [26].…”

Section: Datasetsmentioning

confidence: 99%

“…We also add <SPE> tokens at the beginning of the sequences x and y, of which output features are used to computed joint probability. For retrieval tasks like image-text retrieval, we use train- [8] 128 0.02778 CC3M [60] 128 0.01389 Visual Genome [30] 128 0.01389 COCO Caption [11] 128 0.01389 SBU [50] 128 0.01389 PAQ [35] 512 0.0222 Table 10. Ingredients and hyper-parameters for our pre-training.…”

Section: Formulation Of Novel Tasksmentioning

confidence: 99%

See 1 more Smart Citation

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Zhu¹,

Zhu²,

et al. 2021

Preprint

View full text Add to dashboard Cite

Biological intelligence systems of animals perceive the world by integrating information in different modalities and processing simultaneously for various tasks. In contrast, current machine learning research follows a task-specific paradigm, leading to inefficient collaboration between tasks and high marginal costs of developing perception models for new tasks. In this paper, we present a generic perception architecture named Uni-Perceiver, which processes a variety of modalities and tasks with unified modeling and shared parameters. Specifically, Uni-Perceiver encodes different task inputs and targets from arbitrary modalities into a unified representation space with a modality-agnostic Transformer encoder and lightweight modality-specific tokenizers. Different perception tasks are modeled as the same formulation, that is, finding the maximum likelihood target for each input through the similarity of their representations. The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage. Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks. The performance can be improved to a level close to state-of-the-art methods by con-

show abstract

Section: Datasetsmentioning

confidence: 99%

Section: Formulation Of Novel Tasksmentioning

confidence: 99%

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Zhu¹,

Zhu²,

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…3 For example, the IMDB and SST-2 models, which are tasked with classifying the sentiment of movie reviews, are paired with a corpus of unlabeled Amazon product reviews (Ni et al, 2019). TREC, a question classification task, is paired with PAQ (Lewis et al, 2021), a collection of 65 million questions. AGNews, a news classification task, is paired with CC-News corpus (Nagel, 2016).…”

Section: Implementation Detailsmentioning

confidence: 99%

Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models

Ye¹,

Khabsa²,

Lewis³

et al. 2021

Preprint

View full text Add to dashboard Cite

Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time. However, the improved inference speed may be still unsatisfactory for certain timesensitive applications. In this paper, we aim to further push the limit of inference speed by exploring a new area in the design space of the student model. More specifically, we consider distilling a transformer-based text classifier into a billion-parameter, sparsely-activated student model with a embedding-averaging architecture. Our experiments show that the student models retain 97% of the RoBERTa-Large teacher performance on a collection of six text classification tasks. Meanwhile, the student model achieves up to 600x speed-up on both GPUs and CPUs, compared to the teacher models. Further investigation shows that our pipeline is also effective in privacy-preserving and domain generalization settings.

show abstract

“…More standard data augmentation techniques where the synthetic data bears no instance-level relation to the original data has shown only weak improvements to robustness and out-of-domain generalization (Bartolo et al, 2021;Lewis et al, 2021). In this work, we analyze the effectiveness of CDA against such augmentation techniques.…”

Section: Data Augmentationmentioning

confidence: 99%

“…For instance, there are a significant number of articles about sports teams, books, songs etc. To ensure that the random sampling of Wikipedia paragraphs has a similar distribution, we employ the learned passage selection model from Lewis et al (2021) 1 , which is the basis of closely related work on data augmentation (non-counterfactual) for the SQuAD reading comprehension dataset (Bartolo et al, 2021).…”

Section: Baselinesmentioning

confidence: 99%

Retrieval-guided Counterfactual Generation for QA

Paranjape¹,

Lamm²,

Tenney³

2021

Preprint

View full text Add to dashboard Cite

Deep NLP models have been shown to learn spurious correlations, leaving them brittle to input perturbations. Recent work has shown that counterfactual or contrastive data -i.e. minimally perturbed inputs -can reveal these weaknesses, and that data augmentation using counterfactuals can help ameliorate them. Proposed techniques for generating counterfactuals rely on human annotations, perturbations based on simple heuristics, and meaning representation frameworks. We focus on the task of creating counterfactuals for question answering, which presents unique challenges related to world knowledge, semantic diversity, and answerability. To address these challenges, we develop a Retrieve-Generate-Filter (RGF) technique to create counterfactual evaluation and training data with minimal human supervision. Using an open-domain QA framework and question generation model trained on original task data, we create counterfactuals that are fluent, semantically diverse, and automatically labeled. Data augmentation with RGF counterfactuals improves performance on out-of-domain and challenging evaluation sets over and above existing methods, in both the reading comprehension and open-domain QA settings. Moreover, we find that RGF data leads to significant improvements in a model's robustness to local perturbations.

show abstract

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Cited by 12 publications

References 33 publications

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models

Retrieval-guided Counterfactual Generation for QA

Contact Info

Product

Resources

About