Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.186
|View full text |Cite
|
Sign up to set email alerts
|

Static Embeddings as Efficient Knowledge Bases?

Abstract: Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…We are aware that, for the GLUE benchmark, (static) word embeddings are outperformed by contextual representations such as those obtained by BERT (Devlin et al, 2019). Thus, word embeddings may be better suited for other tasks such as unsupervised machine translation (Artetxe et al, 2019), inferring high-quality embeddings for rare words (Schick and Schütze, 2020), unsupervised word alignment (Jalili Sabet et al, 2020) or knowledge base queries (Dufter et al, 2021). However, we can use the GLUE benchmark as part of an objective and unified framework to evaluate word embeddings.…”
Section: Extrinsic Evaluation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We are aware that, for the GLUE benchmark, (static) word embeddings are outperformed by contextual representations such as those obtained by BERT (Devlin et al, 2019). Thus, word embeddings may be better suited for other tasks such as unsupervised machine translation (Artetxe et al, 2019), inferring high-quality embeddings for rare words (Schick and Schütze, 2020), unsupervised word alignment (Jalili Sabet et al, 2020) or knowledge base queries (Dufter et al, 2021). However, we can use the GLUE benchmark as part of an objective and unified framework to evaluate word embeddings.…”
Section: Extrinsic Evaluation Resultsmentioning
confidence: 99%
“…Word embeddings successfully capture lexical semantic information about words based on cooccurrence patterns extracted from large corpora (Mikolov et al, 2013a;Pennington et al, 2014;Mikolov et al, 2018) or knowledge bases (Bordes et al, 2011), with excellent results on several tasks, including word similarity (Collobert and Weston, 2008;Turian et al, 2010;Socher et al, 2011), Semantic Textual Similarity (Shao, 2017), or more recently, unsupervised machine translation (Artetxe et al, 2019), inferring representations for rare words (Schick and Schütze, 2020), unsupervised word alignment (Jalili Sabet et al, 2020) or knowledge base probes (Dufter et al, 2021). In these tasks, word embeddings perform similarly or better than transformer-based language models such as BERT (Devlin et al, 2019), while requiring a comparatively tiny amount of resources for training and inference.…”
Section: Introductionmentioning
confidence: 99%
“…Static embeddings are particularly suited to tasks that restrict predictions to a candidate set, such as word analogies, since embeddings from these smaller models have a defined vocabulary that can be queried in nearest-neighbor search. Dufter et al (2021) verified this trend for question answering and advocated for use of static embeddings because of their low computational cost: "'green' baselines are often ignored, but should be considered when evaluating resource-hungry deep learning models. "…”
Section: Contextual Embeddingmentioning
confidence: 95%
“…Poerner et al (2020) and Kassner & Schütze (2020) argue that LLMs exploit surface form regularities when making predictions. Zhong et al (2021) and Dufter et al (2021) make a related observation, concluding that simpler models like randomly initialized LLMs, static embeddings, and even a Naive Bayes model can achieve a precision better than a majority baseline. Cao et al (2021) argue that prompts that were found to do well in previous work overfit the distribution of objects in the training data rather than enabling knowledge extraction.…”
Section: Related Workmentioning
confidence: 98%