Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification

Gao, Tianyu; Han, Xu; Liu, Zhiyuan; Sun, Maosong

doi:10.1609/aaai.v33i01.33016407

Cited by 295 publications

(269 citation statements)

References 13 publications

Supporting

Mentioning

266

Contrasting

Order By: Relevance

“…There have been emerging research studies that utilize the above meta-learning algorithms to NLP tasks, including language modelling (Vinyals et al, 2016), text classification , machine translation (Gu et al, 2018), and relation learning (Xiong et al, 2018;Gao et al, 2019). In this paper, we propose to formulate the OOV word representation learning as a few-shot regression problem.…”

Section: Related Workmentioning

confidence: 99%

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Chen

Chang

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Existing approaches for learning word embeddings often assume there are sufficient occurrences for each word in the corpus, such that the representation of words can be accurately estimated from their contexts. However, in real-world scenarios, out-of-vocabulary (a.k.a. OOV) words that do not appear in training corpus emerge frequently. It is challenging to learn accurate representations of these words with only a few observations. In this paper, we formulate the learning of OOV embeddings as a few-shot regression problem, and address it by training a representation function to predict the oracle embedding vector (defined as embedding trained with abundant observations) based on limited observations. Specifically, we propose a novel hierarchical attention-based architecture to serve as the neural regression function, with which the context information of a word is encoded and aggregated from K observations. Furthermore, our approach can leverage Model-Agnostic Meta-Learning (MAML) for adapting the learned model to the new corpus fast and robustly. Experiments show that the proposed approach significantly outperforms existing methods in constructing accurate embeddings for OOV words, and improves downstream tasks where these embeddings are utilized.

show abstract

Section: Related Workmentioning

confidence: 99%

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Chen

Chang

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Oreshkin et al [26] also learns a task-dependent metric, but conditions based on the mean of class prototypes, which can reduce inter-class variations available to their task conditioning network, and requires an auxiliary task co-training loss not needed by our method to realize performance gains. Gao et al [9] applied masks to features in a prototypical network applied to a NLP few-shot sentence classification task, but base their masks only on examples within each class, not between classes as our method does.…”

Section: Related Workmentioning

confidence: 99%

Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

Eigen

Dodge

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

342

159

View full text Add to dashboard Cite

Few-shot learning is an important area of research. Conceptually, humans are readily able to understand new concepts given just a few examples, while in more pragmatic terms, limited-example training situations are common in practice. Recent effective approaches to few-shot learning employ a metric-learning framework to learn a feature similarity comparison between a query (test) example, and the few support (training) examples. However, these approaches treat each support class independently from one another, never looking at the entire task as a whole. Because of this, they are constrained to use a single set of features for all possible test-time tasks, which hinders the ability to distinguish the most relevant dimensions for the task at hand. In this work, we introduce a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners. This component traverses across the entire support set at once, identifying task-relevant features based on both intra-class commonality and inter-class uniqueness in the feature space. Incorporating our module improves performance considerably (5%-10% relative) over baseline systems on both mini-ImageNet and tieredImageNet benchmarks, with overall performance competitive with recent state-of-the-art systems.

show abstract

“…Afterward the encoder compares the new sample with prototypes, and classifies it to the class with the closest prototype [28]. Previous studies [8,28] demonstrate that selection of distance functions will significantly affect the capacity of prototypical networks, so that the model performance is vulnerable to instance representations. However, due to the paucity of instances in FSL, key information may be lost in noise brought by the diversity of event mentions.…”

Section: Label Not Seen In Trainingmentioning

confidence: 99%

Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection

Deng

Zhang

Kang

et al. 2020

Proceedings of the 13th International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

Event detection (ED), a sub-task of event extraction, involves identifying triggers and categorizing event mentions. Existing methods primarily rely upon supervised learning and require large-scale labeled event datasets which are unfortunately not readily available in many real-life applications. In this paper, we consider and reformulate the ED task with limited labeled data as a Few-Shot Learning problem. We propose a Dynamic-Memory-Based Prototypical Network (DMB-PN), which exploits Dynamic Memory Network (DMN) to not only learn better prototypes for event types, but also produce more robust sentence encodings for event mentions. Differing from vanilla prototypical networks simply computing event prototypes by averaging, which only consume event mentions once, our model is more robust and is capable of distilling contextual information from event mentions for multiple times due to the multi-hop mechanism of DMNs. The experiments show that DMB-PN not only deals with sample scarcity better than a series of baseline models but also performs more robustly when the variety of event types is relatively large and the instance quantity is extremely small.

show abstract

Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification

Cited by 295 publications

References 13 publications

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection

Contact Info

Product

Resources

About