The Influence of Down-Sampling Strategies on

Hellrich, Johannes; Kampe, Bernd; Hahn, Udo

doi:10.18653/v1/w19-2003

Cited by 7 publications

(7 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such measures' lack of reliability may partly stem from the fact that word embeddings themselves are often unstable, sensitive to choices of, for instance, word embedding algorithms (Wendlandt et al, 2018;Antoniak and Mimno, 2018;Hellrich et al, 2019), hyper-parameters (Levy et al, 2015;Mimno and Thompson, 2017;Hellrich et al, 2019) and even random seeds (Wendlandt et al, 2018;Hellrich and Hahn, 2016;Bloem et al, 2019) during word embedding training.…”

Section: Related Workmentioning

confidence: 99%

Assessing the Reliability of Word Embedding Gender Bias Measures

Du¹,

Fang²,

Nguyen³

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Various measures have been proposed to quantify human-like social biases in word embeddings. However, bias scores based on these measures can suffer from measurement error. One indication of measurement quality is reliability, concerning the extent to which a measure produces consistent results. In this paper, we assess three types of reliability of word embedding gender bias measures, namely testretest reliability, inter-rater consistency and internal consistency. Specifically, we investigate the consistency of bias scores across different choices of random seeds, scoring rules and words. Furthermore, we analyse the effects of various factors on these measures' reliability scores. Our findings inform better design of word embedding gender bias measures. Moreover, we urge researchers to be more critical about the application of such measures. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Assessing the Reliability of Word Embedding Gender Bias Measures

Du¹,

Fang²,

Nguyen³

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Word Embedding Instability. There have been significant findings in the instability of word embeddings (Hellrich and Hahn, 2016;Hellrich et al, 2019;Antoniak and Mimno, 2018;Burdick et al, 2018;Pierrejean and Tanguy, 2018). Hellrich and Hahn (2016) show that when investigating neighboring words in the embedding space, there is low reliability in which words are surrounding a particular token across multiple runs.…”

Section: Related Workmentioning

confidence: 99%

“…Furthermore, these variations are not only subject to low frequency words, instability is found to be present in vocabulary words that occur relatively frequently (Hellrich and Hahn, 2016). Instability has been shown in multiple algorithms (e.g., Skip-Gram (Hellrich and Hahn, 2016) and SVD (Hellrich et al, 2019)), as well as in different training corpora (e.g., historical text (Hellrich and Hahn, 2016) and social media (Antoniak and Mimno, 2018)).…”

Section: Related Workmentioning

confidence: 99%

An Empirical Study of the Downstream Reliability of Pre-Trained Word Embeddings

Rios

Lwowski

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

While pre-trained word embeddings have been shown to improve the performance of downstream tasks, many questions remain regarding their reliability: Do the same pre-trained word embeddings result in the best performance with slight changes to the training data? Do the same pre-trained embeddings perform well with multiple neural network architectures? Do imputation strategies for unknown words impact reliability? In this paper, we introduce two new metrics to understand the downstream reliability of word embeddings. We find that downstream reliability of word embeddings depends on multiple factors, including, the evaluation metric, the handling of out-of-vocabulary words, and whether the embeddings are fine-tuned.

show abstract

“…There have been many recent works studying word embedding instability (Hellrich & Hahn, 2016;Antoniak & Mimno, 2018;Wendlandt et al, 2018;Pierrejean & Tanguy, 2018;Chugh et al, 2018;Hellrich et al, 2019); these works have focused on the intrinsic instability of word embeddings, meaning the stability measured between the embedding matrices without training a downstream model. In the work of Wendlandt et al (2018) they do consider a downstream task (part-of-speech tagging), but focus on how the intrinsic instability impacts the error of words on this task.…”

Section: Related Workmentioning

confidence: 99%

“…In this work, we take a first step toward addressing the problem of ML model instability by examining in detail a core building block of most modern natural language processing (NLP) applications: word embeddings (Mikolov et al, 2013a;Pennington et al, 2014;Bojanowski et al, 2017). Several recent works have shown that word embeddings are unstable, with the nearest neighbors to words varying significantly across embeddings trained under different settings (Hellrich & Hahn, 2016;Antoniak & Mimno, 2018;Wendlandt et al, 2018;Pierrejean & Tanguy, 2018;Chugh et al, 2018;Hellrich et al, 2019). These results may cause researchers using embeddings for analysis to reassess the reliability of their conclusions.…”

Section: Introductionmentioning

confidence: 99%

Understanding the Downstream Instability of Word Embeddings

Leszczynski,

May,

Zhang

et al. 2020

Preprint

View full text Add to dashboard Cite

Many industrial machine learning (ML) systems require frequent retraining to keep up-to-date with constantly changing data. This retraining exacerbates a large challenge facing ML systems today: model training is unstable, i.e., small changes in training data can cause significant changes in the model's predictions. In this paper, we work on developing a deeper understanding of this instability, with a focus on how a core building block of modern natural language processing (NLP) pipelines-pre-trained word embeddings-affects the instability of downstream NLP models. We first empirically reveal a tradeoff between stability and memory: increasing the embedding memory 2× can reduce the disagreement in predictions due to small changes in training data by 5% to 37% (relative). To theoretically explain this tradeoff, we introduce a new measure of embedding instability-the eigenspace instability measure-which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings. Practically, we show that the eigenspace instability measure can be a cost-effective way to choose embedding parameters to minimize instability without training downstream models, outperforming other embedding distance measures and performing competitively with a nearest neighbor-based measure. Finally, we demonstrate that the observed stability-memory tradeoffs extend to other types of embeddings as well, including knowledge graph and contextual word embeddings.

show abstract

The Influence of Down-Sampling Strategies on

Cited by 7 publications

References 32 publications

Assessing the Reliability of Word Embedding Gender Bias Measures

Assessing the Reliability of Word Embedding Gender Bias Measures

An Empirical Study of the Downstream Reliability of Pre-Trained Word Embeddings

Understanding the Downstream Instability of Word Embeddings

Contact Info

Product

Resources

About