Matching the Blanks: Distributional Similarity for Relation Learning

Soares, Livio Baldini; FitzGerald, Nicholas; Ling, Jeffrey; Kwiatkowski, Tom

doi:10.48550/arxiv.1906.03158

Cited by 24 publications

(33 citation statements)

References 10 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This vocabulary size is two orders of magnitude larger than in previous work that applies a Transformer model with full softmax loss [8,23,18]. Other works, such as [24] and [17], train a Transformer model with a large number of entities using sampled softmax, with either in-batch or in-example negative sampling. But as we shall show, sampled softmax, even with a large number of 128K negative samples, results in much worse quality.…”

Section: Wikipedia Entity Predictionmentioning

confidence: 99%

Superbloom: Bloom filter meets Transformer

Anderson¹,

Huang²,

Krichene³

et al. 2020

Preprint

View full text Add to dashboard Cite

We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multilayer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size.

show abstract

Section: Wikipedia Entity Predictionmentioning

confidence: 99%

Superbloom: Bloom filter meets Transformer

Anderson¹,

Huang²,

Krichene³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…For an input sentence containing a relation mention, two entities Entity 1 and Entity 2 are marked in advance. We follow the labeling mechanism adopted by Soares et al (2019) and Zhang and Wang (2015) to enhance the position information of entities. For each sentence X = [x 1 , .., x T ], four reserved tokens…”

Section: Relation Classification Networkmentioning

confidence: 99%

Semi-supervised Relation Extraction via Incremental Meta Self-Training

Hu¹,

Ma²,

Liu³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

show abstract

“…FewRel 2.0 (Gao et al, 2019b) extends the dataset with few-shot domain adaption and few-shot none-of-the-above detection. Many works on FewRel datasets focus on improvement of methods, including modeling the distance distribution (Gao et al, 2019a;Ding et al, 2021), utilizing external knowledge such as knowledge graph (Qu et al, 2020), learning different level of features (Sun et al, 2019;Ye and Ling, 2019), and using pre-trained language models (Soares et al, 2019). Apart from method innovation on the standard setting of consistent few-shot RC, the investigation for inconsistent few-shot RC still in demand.…”

Section: Related Workmentioning

confidence: 99%

Inconsistent Few-Shot Relation Classification via Cross-Attentional Prototype Networks with Contrastive Learning

Wang,

Jin,

Cao

et al. 2021

Preprint

View full text Add to dashboard Cite

Standard few-shot relation classification (RC) is designed to learn a robust classifier with only few labeled data for each class. However, previous works rarely investigate the effects of a different number of classes (i.e., N -way) and number of labeled data per class (i.e., K-shot) during training vs. testing. In this work, we define a new task, inconsistent few-shot RC, where the model needs to handle the inconsistency of N and K between training and testing. To address this new task, we propose Prototype Network-based crossattention contrastive learning (ProtoCACL) to capture the rich mutual interactions between the support set and query set. Experimental results demonstrate that our ProtoCACL can outperform the state-of-the-art baseline model under both inconsistent K and inconsistent N settings, owing to its more robust and discriminate representations. Moreover, we identify that in the inconsistent few-shot learning setting, models can achieve better performance with less data than the standard few-shot setting with carefully-selected N and K. In the end of the paper, we provide further analyses and suggestions to systematically guide the selection of N and K under different scenarios.

show abstract

Matching the Blanks: Distributional Similarity for Relation Learning

Cited by 24 publications

References 10 publications

Superbloom: Bloom filter meets Transformer

Superbloom: Bloom filter meets Transformer

Semi-supervised Relation Extraction via Incremental Meta Self-Training

Inconsistent Few-Shot Relation Classification via Cross-Attentional Prototype Networks with Contrastive Learning

Contact Info

Product

Resources

About