Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Hautte, Jeroen Van; Emerson, Guy; Rei, Marek

doi:10.18653/v1/d19-6104

Cited by 3 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The latter noticed that not including the stop-words greatly improves the performance on the evaluation tasks. To optimise the performance of the additive model, Van Hautte et al (2019) proposed weighting the context words according to distance and frequency, as well as subtracting a "negative sampling" vector. These modifications take hyperparameters that are important for Skip-Gram's strong performance, such as number of negative samples k and window size n (Levy et al, 2015), and apply them to the few-shot setting.…”

Section: Background: Dependency-based Word Embeddingsmentioning

confidence: 99%

“…Distributional semantics models create word embeddings based on the assumption that the meaning of a word is defined by the contexts it is used in (for an overview, see : Sahlgren, 2008;Lenci, 2018;Boleda, 2020;Emerson, 2020). A fundamental challenge for these approaches is the difficulty of producing high-quality embeddings for rare words, since the models often require vast amounts of training examples (Adams et al, 2017;Van Hautte et al, 2019). To address this problem, various few-shot learning methods have been previously introduced.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using dependency parsing for few-shot learning in distributional semantics

Preda¹,

Emerson²

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Self Cite

View full text Add to dashboard Cite

In this work, we explore the novel idea of employing dependency parsing information in the context of few-shot learning, the task of learning the meaning of a rare word based on a limited amount of context sentences. Firstly, we use dependency-based word embedding models as background spaces for few-shot learning. Secondly, we introduce two few-shot learning methods which enhance the additive baseline model by using dependencies.

show abstract

Section: Background: Dependency-based Word Embeddingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Using dependency parsing for few-shot learning in distributional semantics

Preda¹,

Emerson²

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Self Cite

View full text Add to dashboard Cite

show abstract

“…Mikolov et al, 2013) are known to struggle with rare words, several techniques for improving their representations have been proposed. These approaches exploit either the contexts in which rare words occur (Lazaridou et al, 2017;Herbelot and Baroni, 2017;Khodak et al, 2018;Liu et al, 2019a), their surfaceform (Luong et al, 2013;Bojanowski et al, 2017;Pinter et al, 2017), or both (Schick and Schütze, 2019a,b;Hautte et al, 2019). However, all of this prior work is designed for and evaluated on uncontextualized word embeddings.…”

Section: Introductionmentioning

confidence: 99%

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Schick

Schütze

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Pretraining deep language models has led to large performance gains in NLP. Despite this success, Schick and Schütze (2020) recently showed that these models struggle to understand rare words. For static word embeddings, this problem has been addressed by separately learning representations for rare words. In this work, we transfer this idea to pretrained language models: We introduce BERTRAM, a powerful architecture based on BERT that is capable of inferring high-quality embeddings for rare words that are suitable as input representations for deep language models. This is achieved by enabling the surface form and contexts of a word to interact with each other in a deep architecture. Integrating BERTRAM into BERT leads to large performance increases due to improved representations of rare and medium frequency words on both a rare word probing task and three downstream tasks. 1

show abstract

“…Mikolov et al, 2013) are known to struggle with rare words, several techniques for improving their representations have been proposed. These approaches exploit either the contexts in which rare words occur (Lazaridou et al, 2017;Herbelot and Baroni, 2017;Khodak et al, 2018;Liu et al, 2019a), their surface-form (Luong et al, 2013;Bojanowski et al, 2017;Pinter et al, 2017), or both (Schick and Schütze, 2019b;Hautte et al, 2019). However, all of these approaches are designed for and evaluated on uncontextualized word embeddings.…”

Section: Introductionmentioning

confidence: 99%

“…Assessing the effectiveness of methods like BERTRAM in a contextualized setting is challenging: While most previous work on rare words was evaluated on datasets explicitly focusing on rare words (e.g Luong et al, 2013;Herbelot and Baroni, 2017;Khodak et al, 2018;Liu et al, 2019a;Hautte et al, 2019), all of these datasets are tailored towards context-independent embeddings and thus not suitable for evaluating our model. Furthermore, understanding rare words is of negligible importance for most commonly used downstream task datasets.…”

Section: Introductionmentioning

confidence: 99%

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Schick¹,

Schütze²

2019

Preprint

View full text Add to dashboard Cite

Pretraining deep contextualized representations using an unsupervised language modeling objective has led to large performance gains for a variety of NLP tasks. Despite this success, recent work by Schick and Schütze (2019c) suggests that these architectures struggle to understand rare words. For contextindependent word embeddings, this problem can be addressed by separately learning representations for infrequent words. In this work, we show that the same idea can also be applied to contextualized models and clearly improves their downstream task performance. Most approaches for inducing word embeddings into existing embedding spaces are based on simple bag-of-words models; hence they are not a suitable counterpart for deep neural network language models. To overcome this problem, we introduce BERTRAM, a powerful architecture based on a pretrained BERT language model and capable of inferring high-quality representations for rare words. In BERTRAM, surface form and contexts of a word directly interact with each other in a deep architecture. Both on a rare word probing task and on three downstream task datasets, BERTRAM considerably improves representations for rare and medium frequency words compared to both a standalone BERT model and previous work.

show abstract

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Cited by 3 publications

References 13 publications

Using dependency parsing for few-shot learning in distributional semantics

Using dependency parsing for few-shot learning in distributional semantics

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Contact Info

Product

Resources

About