Proceedings of the 3rd Workshop on Evaluating Vector Space Representations For 2019
DOI: 10.18653/v1/w19-2003
|View full text |Cite
|
Sign up to set email alerts
|

The Influence of Down-Sampling Strategies on

Abstract: The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVD PPMI -type embeddings. This finding seems to explain d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 32 publications
0
5
0
Order By: Relevance
“…Such measures' lack of reliability may partly stem from the fact that word embeddings themselves are often unstable, sensitive to choices of, for instance, word embedding algorithms (Wendlandt et al, 2018;Antoniak and Mimno, 2018;Hellrich et al, 2019), hyper-parameters (Levy et al, 2015;Mimno and Thompson, 2017;Hellrich et al, 2019) and even random seeds (Wendlandt et al, 2018;Hellrich and Hahn, 2016;Bloem et al, 2019) during word embedding training.…”
Section: Related Workmentioning
confidence: 99%
“…Such measures' lack of reliability may partly stem from the fact that word embeddings themselves are often unstable, sensitive to choices of, for instance, word embedding algorithms (Wendlandt et al, 2018;Antoniak and Mimno, 2018;Hellrich et al, 2019), hyper-parameters (Levy et al, 2015;Mimno and Thompson, 2017;Hellrich et al, 2019) and even random seeds (Wendlandt et al, 2018;Hellrich and Hahn, 2016;Bloem et al, 2019) during word embedding training.…”
Section: Related Workmentioning
confidence: 99%
“…Word Embedding Instability. There have been significant findings in the instability of word embeddings (Hellrich and Hahn, 2016;Hellrich et al, 2019;Antoniak and Mimno, 2018;Burdick et al, 2018;Pierrejean and Tanguy, 2018). Hellrich and Hahn (2016) show that when investigating neighboring words in the embedding space, there is low reliability in which words are surrounding a particular token across multiple runs.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, these variations are not only subject to low frequency words, instability is found to be present in vocabulary words that occur relatively frequently (Hellrich and Hahn, 2016). Instability has been shown in multiple algorithms (e.g., Skip-Gram (Hellrich and Hahn, 2016) and SVD (Hellrich et al, 2019)), as well as in different training corpora (e.g., historical text (Hellrich and Hahn, 2016) and social media (Antoniak and Mimno, 2018)).…”
Section: Related Workmentioning
confidence: 99%
“…There have been many recent works studying word embedding instability (Hellrich & Hahn, 2016;Antoniak & Mimno, 2018;Wendlandt et al, 2018;Pierrejean & Tanguy, 2018;Chugh et al, 2018;Hellrich et al, 2019); these works have focused on the intrinsic instability of word embeddings, meaning the stability measured between the embedding matrices without training a downstream model. In the work of Wendlandt et al (2018) they do consider a downstream task (part-of-speech tagging), but focus on how the intrinsic instability impacts the error of words on this task.…”
Section: Related Workmentioning
confidence: 99%
“…In this work, we take a first step toward addressing the problem of ML model instability by examining in detail a core building block of most modern natural language processing (NLP) applications: word embeddings (Mikolov et al, 2013a;Pennington et al, 2014;Bojanowski et al, 2017). Several recent works have shown that word embeddings are unstable, with the nearest neighbors to words varying significantly across embeddings trained under different settings (Hellrich & Hahn, 2016;Antoniak & Mimno, 2018;Wendlandt et al, 2018;Pierrejean & Tanguy, 2018;Chugh et al, 2018;Hellrich et al, 2019). These results may cause researchers using embeddings for analysis to reassess the reliability of their conclusions.…”
Section: Introductionmentioning
confidence: 99%