Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Bansal, Trapit; Jha, Rishikesh; Munkhdalai, Tsendsuren; McCallum, Andrew

doi:10.18653/v1/2020.emnlp-main.38

Cited by 64 publications

(84 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Informed output layer initialization in Proto(FO)MAML is therefore important for effective learning in such scenarios. A similar problem with FOMAML is also pointed out by Bansal et al (2019), who design a differentiable parameter generator for the output layer.…”

Section: Discussionmentioning

confidence: 81%

“…Dou et al (2019) perform metatraining on certain high-resource tasks from the GLUE benchmark and metatest on certain low-resource tasks from the same benchmark. Bansal et al (2019) propose a softmax parameter generator component that can enable a varying number of classes in the meta-training tasks. They choose the tasks in GLUE along with SNLI (Bowman et al, 2015) for meta-training, and use entity typing, relation classification, sentiment classification, text categorization, and scientific NLI as the test tasks.…”

Section: Meta-learning In Nlpmentioning

confidence: 99%

“…It has achieved success in computer vision (Triantafillou et al, 2020;Fontanini et al, 2020;Hendryx et al, 2019;Wang et al, 2020) and reinforcement learning (Wang et al, 2016;Duan et al, 2016;Alet et al, 2020). It has also recently made its way into NLP, and has been applied to machine translation (Gu et al, 2018), relation (Obamuyide and Vlachos, 2019b) and text (Yu et al, 2018) classification, and sentence-level semantic tasks (Dou et al, 2019;Bansal et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation

Holla¹,

Mishra²,

Yannakoudakis³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

The success of deep learning methods hinges on the availability of large training datasets annotated for the task of interest. In contrast to human intelligence, these methods lack versatility and struggle to learn and adapt quickly to new tasks, where labeled data is scarce. Metalearning aims to solve this problem by training a model on a large number of few-shot tasks, with an objective to learn new tasks quickly from a small number of examples. In this paper, we propose a meta-learning framework for few-shot word sense disambiguation (WSD), where the goal is to learn to disambiguate unseen words from only a few labeled instances. Meta-learning approaches have so far been typically tested in an N -way, K-shot classification setting where each task has N classes with K examples per class. Owing to its nature, WSD deviates from this controlled setup and requires the models to handle a large number of highly unbalanced classes. We extend several popular meta-learning approaches to this scenario, and analyze their strengths and weaknesses in this new challenging setting.

show abstract

Section: Discussionmentioning

confidence: 81%

Section: Meta-learning In Nlpmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation

Holla¹,

Mishra²,

Yannakoudakis³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

show abstract

“…The problem of zero shot, and few shot learning has lately been proposed in the context of NLP Geng et al, 2019) using meta-learning. Model agnostic meta-learning (MAML) has been explored to tackle tasks with disjoint label spaces (Bansal et al, 2019). However, these models are not capable of making zero shot predictions.…”

Section: Related Workmentioning

confidence: 99%

“…Few-shot transfer learning. Real world text classification scenarios are often characterized by a lack of annotated corpora and rapidly changing information needs (Chiticariu et al, 2013), motivating research into methods that allow us to train text classifiers for new classes with only a handful of training examples (Bansal et al, 2019;Yogatama et al, 2019). In such cases, a standard approach is to transfer knowledge from an existing model for classification task X to initialize the weights for a model for the new classification task Y.…”

Section: Introductionmentioning

confidence: 99%

Task-Aware Representation of Sentences for Generic Text Classification

Halder¹,

Akbik²,

Krapac³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

State-of-the-art approaches for text classification leverage a transformer architecture with a linear layer on top that outputs a class distribution for a given prediction problem. While effective, this approach suffers from conceptual limitations that affect its utility in few-shot or zero-shot transfer learning scenarios. First, the number of classes to predict needs to be pre-defined. In a transfer learning setting, in which new classes are added to an already trained classifier, all information contained in a linear layer is therefore discarded, and a new layer is trained from scratch. Second, this approach only learns the semantics of classes implicitly from training examples, as opposed to leveraging the explicit semantic information provided by the natural language names of the classes. For instance, a classifier trained to predict the topics of news articles might have classes like "business" or "sports" that themselves carry semantic information. Extending a classifier to predict a new class named "politics" with only a handful of training examples would benefit from both leveraging the semantic information in the name of a new class and using the information contained in the already trained linear layer. This paper presents a novel formulation of text classification that addresses these limitations. It imbues the notion of the task at hand into the transformer model itself by factorizing arbitrary classification problems into a generic binary classification problem. We present experiments in few-shot and zero-shot transfer learning that show that our approach significantly outperforms previous approaches on small training data and can even learn to predict new classes with no training examples at all.

show abstract

Improving Pre-trained Language Models

Paaß

Giesselbach

2023

Artificial Intelligence: Foundations, Theory, and Algorithms

View full text Add to dashboard Cite

This chapter describes a number of different approaches to improve the performance of Pre-trained Language Models (PLMs), i.e. variants of BERT, autoregressive language models similar to GPT, and sequence-to-sequence models like Transformers. First we may modify the pre-training tasks to learn as much as possible about the syntax and semantics of language. Then we can extend the length of the input sequence to be able to process longer inputs. Multilingual models are simultaneously trained with text in different languages. Most important is the inclusion of further knowledge into the PLM to produce better predictions. It turns out that by increasing the number of parameters, the size of the training data and the computing effort the performance of the models can always be increased. There are a number of different fine-tuning strategies which allow the model to be adapted to special tasks. In addition, models may be instructed by few-shot prompts to solve specific tasks. This is especially rewarding for larger PLMs, which therefore are called Foundation Models.

show abstract

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Cited by 64 publications

References 28 publications

Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation

Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation

Task-Aware Representation of Sentences for Generic Text Classification

Improving Pre-trained Language Models

Contact Info

Product

Resources

About