E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Poerner, Nina; Waltinger, Ulli; Schütze, Hinrich

doi:10.18653/v1/2020.findings-emnlp.71

Cited by 110 publications

(116 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This argues for proponents of resourcehungry deep learning models to try harder to find cheap "green" baselines or to combine the best of both worlds (cf. Poerner et al, 2020).…”

Section: Modelmentioning

confidence: 99%

Static Embeddings as Efficient Knowledge Bases?

Dufter

Kassner

Schütze

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to a candidate set, simple nearest neighbor matching using static embeddings performs better than PLMs. E.g., static embeddings perform 1.6% points better than BERT while just using 0.3% of energy for training. One important factor in their good comparative performance is that static embeddings are standardly learned for a large vocabulary. In contrast, BERT exploits its more sophisticated, but expensive ability to compose meaningful representations from a much smaller subword vocabulary.

show abstract

“…This argues for proponents of resourcehungry deep learning models to try harder to find cheap "green" baselines or to combine the best of both worlds (cf. Poerner et al, 2020).…”

Section: Modelmentioning

confidence: 99%

Static Embeddings as Efficient Knowledge Bases?

Dufter

Kassner

Schütze

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…Factual Knowledge Retrieval from LMs Several works have focused on probing factual knowledge solely from pre-trained LMs without access to external knowledge. They do so by either using prompts and letting the LM fill in the blanks, which assumes that the LM is a static knowledge source (Petroni et al, 2019;Jiang et al, 2020;Poerner et al, 2019;Bouraoui et al, 2020), or fine-tuning the LM on a set of question-answer pairs to directly generate answers, which dynamically adapts the LM to this particular task (Roberts et al, 2020). Impressive results demonstrated by these works indicate that large-scale LMs contain a significant amount of knowledge, in some cases even outperforming competitive question answering systems relying on external resources (Roberts et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

“…Language models (LMs; (Church, 1988;Kneser and Ney, 1995;Bengio et al, 2003)) learn to model the probability distribution of text, and in doing so capture information about various aspects of the syntax or semantics of the language at hand. Recent works have presented intriguing results demonstrating that modern large-scale LMs also capture a significant amount of factual knowledge (Petroni et al, 2019;Jiang et al, 2020;Poerner et al, 2019). This knowledge is generally probed by having the LM fill in the blanks of cloze-style prompts such as en fr nl ru es jp vi zh hu ko tr he Figure 1: X-FACTR contains 23 languages, for which the data availability varies dramatically.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Jiang¹,

Anastasopoulos²,

Araki³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-theblank questions such as "Punta Cana is located in _." However, while knowledge is both written and queried in many languages, studies on LMs' factual representation ability have almost invariably been performed on English. To assess factual knowledge retrieval in LMs in different languages, we create a multilingual benchmark of cloze-style probes for 23 typologically diverse languages. To properly handle language variations, we expand probing methods from single-to multi-word entities, and develop several decoding algorithms to generate multi-token predictions. Extensive experimental results provide insights about how well (or poorly) current state-of-theart LMs perform at this task in languages with more or fewer available resources. We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages. Benchmark data and code have be released at https: //x-factr.github.io.

show abstract

“…However, the introduction of BERT has essentially eliminated the need for static word vectors in standard settings. On the other hand, several authors have shown that it can be beneficial to incorporate entity vectors with BERT, allowing the model to exploit factual or commonsense knowledge from structured sources (Lin et al, 2019;Poerner et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

Combining BERT with Static Word Embeddings for Categorizing Social Media

Alghanmi¹,

Espinosa-Anke²,

Schockaert³

2020

Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020)

View full text Add to dashboard Cite

Pre-trained neural language models (LMs) have achieved impressive results in various natural language processing tasks, across different languages. Surprisingly, this extends to the social media genre, despite the fact that social media often has very different characteristics from the language that LMs have seen during training. A particularly striking example is the performance of AraBERT, an LM for the Arabic language, which is successful in categorizing social media posts in Arabic dialects, despite only having been trained on Modern Standard Arabic. Our hypothesis in this paper is that the performance of LMs for social media can nonetheless be improved by incorporating static word vectors that have been specifically trained on social media. We show that a simple method for incorporating such word vectors is indeed successful in several Arabic and English benchmarks. Curiously, however, we also find that similar improvements are possible with word vectors that have been trained on traditional text sources (e.g. Wikipedia).

show abstract

E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Cited by 110 publications

References 19 publications

Static Embeddings as Efficient Knowledge Bases?

Static Embeddings as Efficient Knowledge Bases?

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Combining BERT with Static Word Embeddings for Categorizing Social Media

Contact Info

Product

Resources

About