Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking

Broscheit, Samuel

doi:10.18653/v1/k19-1063

Cited by 87 publications

(112 citation statements)

References 20 publications

Supporting

Mentioning

109

Contrasting

Order By: Relevance

“…Entity linking (EL) is the task of detecting entity spans in a text and linking them to the underlying entity ID. While there are recent advances in fully end-to-end EL (Broscheit, 2019), the task is typically broken down into three steps: (1) detecting spans that are potential entity spans, (2) generating sets of candidate entities for these spans, (3) selecting the correct candidate for each span.…”

Section: Entity Linkingmentioning

confidence: 99%

“…To ensure coverage of the necessary entities, we include all gold entities and all generator candidates in the entity vocabulary L Ent , even if they fall under the Wikipedia2Vec link threshold (see Section 3.3). While this is based on the unrealistic assumption that we know the contents of the test set in advance, it is necessary for comparability with Peters et al ( 2019), Kolitsas et al (2018) and Broscheit (2019), who also design their entity vocabulary around the data. See Appendix for more details on data and preprocessing.…”

Section: Finetuning We Finetune E-bert-mlm On the Training Set To Minimizementioning

confidence: 99%

See 1 more Smart Citation

E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Poerner

Waltinger

Schütze

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

108

View full text Add to dashboard Cite

We present a novel way of injecting factual knowledge about entities into the pretrained BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to ERNIE (Zhang et al., 2019) and KnowBert (Peters et al., 2019), but it requires no expensive further pretraining of the BERT encoder. We evaluate E-BERT on unsupervised question answering (QA), supervised relation classification (RC) and entity linking (EL). On all three tasks, E-BERT outperforms BERT and other baselines. We also show quantitatively that the original BERT model is overly reliant on the surface form of entity names (e.g., guessing that someone with an Italian-sounding name speaks Italian), and that E-BERT mitigates this problem.

show abstract

Section: Entity Linkingmentioning

confidence: 99%

Section: Finetuning We Finetune E-bert-mlm On the Training Set To Minimizementioning

confidence: 99%

E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Poerner

Waltinger

Schütze

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

108

View full text Add to dashboard Cite

show abstract

“…Several works (Radford et al, 2019b;Keskar et al, 2019) show the remarkable fluency and gram-matical correctness of text decoded from modern LMs. Additionally, recent works (Petroni et al, 2019;Logan et al, 2019;Broscheit, 2019;Roberts et al, 2020) demonstrate that beyond general linguistic capabilities, language models can also pick up factual knowledge present in the training data. However, it is unclear if LMs are able to convey such knowledge at decoding time when producing long sequences-do they generate fluent, grammatical but "babbler-level" text or can they produce utterances that reflect factual world knowledge?…”

Section: Introductionmentioning

confidence: 99%

How Decoding Strategies Affect the Verifiability of Generated Text

Massarelli¹,

Petroni²,

Piktus³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Recent progress in pre-trained language models led to systems that are able to generate text of an increasingly high quality. While several works have investigated the fluency and grammatical correctness of such models, it is still unclear to which extent the generated text is consistent with factual world knowledge. Here, we go beyond fluency and also investigate the verifiability of text generated by stateof-the-art pre-trained language models. A generated sentence is verifiable if it can be corroborated or disproved by Wikipedia, and we find that the verifiability of generated text strongly depends on the decoding strategy. In particular, we discover a tradeoff between factuality (i.e., the ability of generating Wikipedia corroborated text) and repetitiveness. While decoding strategies such as top-k and nucleus sampling lead to less repetitive generations, they also produce less verifiable text. Based on these finding, we introduce a simple and effective decoding strategy which, in comparison to previously used decoding strategies, produces less repetitive and more verifiable text. * Work done during internship with Facebook. † Equal contribution.

show abstract

“…For short text, the features provided by the context are limited, and the features cannot be well represented, which leads to some models not fully learning the features of the text. Therefore, some scholars proposed pre-training a model to address this problem [19], connecting the text and description as the input of BERT, and then classifying the vector output by [CLS] position together with the start position and end position vectors of the candidate entities. However, because the features extracted by the BERT model are relatively broad and noisy, we believe performance on this task can be further improved.…”

Section: Related Workmentioning

confidence: 99%

“…Finally, the fused semantic features are input into the fully connected layer with a sigmoid activation function for classification, as shown in formulas (18) and (19).…”

Section: Fusion Layermentioning

confidence: 99%