Static Embeddings as Efficient Knowledge Bases?

Dufter, Philipp; Kassner, Nora; Schütze, Hinrich

doi:10.18653/v1/2021.naacl-main.186

Cited by 10 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We are aware that, for the GLUE benchmark, (static) word embeddings are outperformed by contextual representations such as those obtained by BERT (Devlin et al, 2019). Thus, word embeddings may be better suited for other tasks such as unsupervised machine translation (Artetxe et al, 2019), inferring high-quality embeddings for rare words (Schick and Schütze, 2020), unsupervised word alignment (Jalili Sabet et al, 2020) or knowledge base queries (Dufter et al, 2021). However, we can use the GLUE benchmark as part of an objective and unified framework to evaluate word embeddings.…”

Section: Extrinsic Evaluation Resultsmentioning

confidence: 99%

“…Word embeddings successfully capture lexical semantic information about words based on cooccurrence patterns extracted from large corpora (Mikolov et al, 2013a;Pennington et al, 2014;Mikolov et al, 2018) or knowledge bases (Bordes et al, 2011), with excellent results on several tasks, including word similarity (Collobert and Weston, 2008;Turian et al, 2010;Socher et al, 2011), Semantic Textual Similarity (Shao, 2017), or more recently, unsupervised machine translation (Artetxe et al, 2019), inferring representations for rare words (Schick and Schütze, 2020), unsupervised word alignment (Jalili Sabet et al, 2020) or knowledge base probes (Dufter et al, 2021). In these tasks, word embeddings perform similarly or better than transformer-based language models such as BERT (Devlin et al, 2019), while requiring a comparatively tiny amount of resources for training and inference.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Benchmarking Meta-embeddings: What Works and What Does Not

García-Ferrero¹,

Agerri²,

Rigau³

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

In the last few years, several methods have been proposed to build meta-embeddings. The general aim was to obtain new representations integrating complementary knowledge from different source pre-trained embeddings thereby improving their overall quality. However, previous meta-embeddings have been evaluated using a variety of methods and datasets, which makes it difficult to draw meaningful conclusions regarding the merits of each approach. In this paper we propose a unified common framework, including both intrinsic and extrinsic tasks, for a fair and objective meta-embeddings evaluation. Furthermore, we present a new method to generate meta-embeddings, outperforming previous work on a large number of intrinsic evaluation benchmarks. Our evaluation framework also allows us to conclude that previous extrinsic evaluations of meta-embeddings have been overestimated.

show abstract

Section: Extrinsic Evaluation Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Benchmarking Meta-embeddings: What Works and What Does Not

García-Ferrero¹,

Agerri²,

Rigau³

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

show abstract

“…Static embeddings are particularly suited to tasks that restrict predictions to a candidate set, such as word analogies, since embeddings from these smaller models have a defined vocabulary that can be queried in nearest-neighbor search. Dufter et al (2021) verified this trend for question answering and advocated for use of static embeddings because of their low computational cost: "'green' baselines are often ignored, but should be considered when evaluating resource-hungry deep learning models. "…”

Section: Contextual Embeddingmentioning

confidence: 95%

Task-dependent Optimal Weight Combinations for Static Embeddings

Robinson

Carlson

Mortensen

et al. 2022

NEJLT

View full text Add to dashboard Cite

A variety of NLP applications use word2vec skip-gram, GloVe, and fastText word embeddings. These models learn two sets of embedding vectors, but most practitioners use only one of them, or alternately an unweighted sum of both. This is the first study to systematically explore a range of linear combinations between the first and second embedding sets. We evaluate these combinations on a set of six NLP benchmarks including IR, POS-tagging, and sentence similarity. We show that the default embedding combinations are often suboptimal and demonstrate 1.0-8.0% improvements. Notably, GloVe’s default unweighted sum is its least effective combination across tasks. We provide a theoretical basis for weighting one set of embeddings more than the other according to the algorithm and task. We apply our findings to improve accuracy in applications of cross-lingual alignment and navigational knowledge by up to 15.2%.

show abstract

“…Poerner et al (2020) and Kassner & Schütze (2020) argue that LLMs exploit surface form regularities when making predictions. Zhong et al (2021) and Dufter et al (2021) make a related observation, concluding that simpler models like randomly initialized LLMs, static embeddings, and even a Naive Bayes model can achieve a precision better than a majority baseline. Cao et al (2021) argue that prompts that were found to do well in previous work overfit the distribution of objects in the training data rather than enabling knowledge extraction.…”

Section: Related Workmentioning

confidence: 98%

P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts

Newman¹,

Choubey²,

Rajani³

2021

Preprint

View full text Add to dashboard Cite

Recent work (e.g. LAMA (Petroni et al., 2019)) has found that the quality of the factual information extracted from Large Language Models (LLMs) depends on the prompts used to query them. This inconsistency is problematic because different users will query LLMs for the same information using different wording, but should receive the same, accurate responses regardless. In this work we aim to address this shortcoming by introducing P-Adapters: lightweight models that sit between the embedding layer and first attention layer of LLMs. They take LLM embeddings as input and output continuous prompts that are used to query the LLM. Additionally, we investigate Mixture of Experts (MoE) models that learn a set of continuous prompts ("experts") and select one to query the LLM. They require a separate classifier trained on human-annotated data to map natural language prompts to the continuous ones. P-Adapters perform comparably to the more complex MoE models in extracting factual information from BERT and RoBERTa while eliminating the need for additional annotations. P-Adapters show between 12-26% absolute improvement in precision and 36-50% absolute improvement in consistency over a baseline of only using natural language queries. Finally, we investigate what makes a P-Adapter successful and conclude that access to the LLM's embeddings of the original natural language prompt, particularly the subject of the entity pair being asked about, is a significant factor.

show abstract

Static Embeddings as Efficient Knowledge Bases?

Cited by 10 publications

References 26 publications

Benchmarking Meta-embeddings: What Works and What Does Not

Benchmarking Meta-embeddings: What Works and What Does Not

Task-dependent Optimal Weight Combinations for Static Embeddings

P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts

Contact Info

Product

Resources

About