Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019) 2019
DOI: 10.18653/v1/d19-6104
|View full text |Cite
|
Sign up to set email alerts
|

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Abstract: Word embeddings are an essential component in a wide range of natural language processing applications. However, distributional semantic models are known to struggle when only a small number of context sentences are available. Several methods have been proposed to obtain higher-quality vectors for these words, leveraging both this context information and sometimes the word forms themselves through a hybrid approach. We show that the current tasks do not suffice to evaluate models that use word-form information… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…The latter noticed that not including the stop-words greatly improves the performance on the evaluation tasks. To optimise the performance of the additive model, Van Hautte et al (2019) proposed weighting the context words according to distance and frequency, as well as subtracting a "negative sampling" vector. These modifications take hyperparameters that are important for Skip-Gram's strong performance, such as number of negative samples k and window size n (Levy et al, 2015), and apply them to the few-shot setting.…”
Section: Background: Dependency-based Word Embeddingsmentioning
confidence: 99%
See 1 more Smart Citation
“…The latter noticed that not including the stop-words greatly improves the performance on the evaluation tasks. To optimise the performance of the additive model, Van Hautte et al (2019) proposed weighting the context words according to distance and frequency, as well as subtracting a "negative sampling" vector. These modifications take hyperparameters that are important for Skip-Gram's strong performance, such as number of negative samples k and window size n (Levy et al, 2015), and apply them to the few-shot setting.…”
Section: Background: Dependency-based Word Embeddingsmentioning
confidence: 99%
“…Distributional semantics models create word embeddings based on the assumption that the meaning of a word is defined by the contexts it is used in (for an overview, see : Sahlgren, 2008;Lenci, 2018;Boleda, 2020;Emerson, 2020). A fundamental challenge for these approaches is the difficulty of producing high-quality embeddings for rare words, since the models often require vast amounts of training examples (Adams et al, 2017;Van Hautte et al, 2019). To address this problem, various few-shot learning methods have been previously introduced.…”
Section: Introductionmentioning
confidence: 99%
“…Mikolov et al, 2013) are known to struggle with rare words, several techniques for improving their representations have been proposed. These approaches exploit either the contexts in which rare words occur (Lazaridou et al, 2017;Herbelot and Baroni, 2017;Khodak et al, 2018;Liu et al, 2019a), their surfaceform (Luong et al, 2013;Bojanowski et al, 2017;Pinter et al, 2017), or both (Schick and Schütze, 2019a,b;Hautte et al, 2019). However, all of this prior work is designed for and evaluated on uncontextualized word embeddings.…”
Section: Introductionmentioning
confidence: 99%
“…Mikolov et al, 2013) are known to struggle with rare words, several techniques for improving their representations have been proposed. These approaches exploit either the contexts in which rare words occur (Lazaridou et al, 2017;Herbelot and Baroni, 2017;Khodak et al, 2018;Liu et al, 2019a), their surface-form (Luong et al, 2013;Bojanowski et al, 2017;Pinter et al, 2017), or both (Schick and Schütze, 2019b;Hautte et al, 2019). However, all of these approaches are designed for and evaluated on uncontextualized word embeddings.…”
Section: Introductionmentioning
confidence: 99%
“…Assessing the effectiveness of methods like BERTRAM in a contextualized setting is challenging: While most previous work on rare words was evaluated on datasets explicitly focusing on rare words (e.g Luong et al, 2013;Herbelot and Baroni, 2017;Khodak et al, 2018;Liu et al, 2019a;Hautte et al, 2019), all of these datasets are tailored towards context-independent embeddings and thus not suitable for evaluating our model. Furthermore, understanding rare words is of negligible importance for most commonly used downstream task datasets.…”
Section: Introductionmentioning
confidence: 99%