Robust Entity Linking via Random Walks

Guo, Zhaochen; Barbosa, Denilson

doi:10.1145/2661829.2661887

Cited by 70 publications

(55 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their approach was exclusively evaluated and optimized on the ACE2004, MSNBC and AQUAINT data sets on which the authors achieve state-of-the-art results. A direct comparison of our results and the results of [10] shows that both works perform equally well on the MSNBC data set. Furthermore, our approach performs better on the ACE2004 data set (0.906 vs. 0.877 F1) but loses on the AQUAINT data set (0.842 vs. 0.907 F1).…”

Section: Discussionmentioning

confidence: 51%

“…Anyhow, we use the work of Guo et al [10] as an entry point in the following. Their approach was exclusively evaluated and optimized on the ACE2004, MSNBC and AQUAINT data sets on which the authors achieve state-of-the-art results.…”

Section: Discussionmentioning

confidence: 99%

“…If the underlying KB has a lower number of entities, the average likelihood of a wrong disambiguation is also reduced. In order to compare our algorithm with the approach in [10], we introduce the concept of the Surface Form Ambiguity Degree (SFAD). The SFAD is based on two assumptions: First, both approaches are able to disambiguate all entities in the ground truth data set, i.e.…”

Section: Discussionmentioning

confidence: 99%

“…It incorporates, along with statistical methods, richer relational analysis of the text. In 2014, the authors Guo et al [10] proposed the use of a probability distribution resulting from a random walk with restart over a suitable entity graph to represent the semantics of entities and documents in a unified way. Their algorithm updates the semantic signature of the document as surface forms are disambiguated.…”

Section: Related Workmentioning

confidence: 99%

“…Request permissions from permissions@acm.org. 12,15], most approaches have been optimised to work on a particular type of disambiguation task, like for example on short Twitter messages [2], web pages [12,28], news documents [10,15], encyclopedias [16,11,5], RSS-Feeds [9] etc. While most authors report to outperform other entity disambiguation algorithms on their domain/data set, they do not achieve comparable accuracy on other domains.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Robust and Collective Entity Disambiguation through Semantic Embeddings

Zwicklbauer

Seifert

Granitzer

2016

Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

108

View full text Add to dashboard Cite

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but equally so in facilitating artificial intelligence applications, such as Semantic Search, Reasoning and Question & Answering. We propose a new collective, graph-based disambiguation algorithm utilizing semantic entity and document embeddings for robust entity disambiguation. Robust thereby refers to the property of achieving better than state-of-the-art results over a wide range of very different data sets. Our approach is also able to abstain if no appropriate entity can be found for a specific surface form. Our evaluation shows, that our approach achieves significantly (>5%) better results than all other publicly available disambiguation algorithms on 7 of 9 datasets without data set specific tuning. Moreover, we discuss the influence of the quality of the knowledge base on the disambiguation accuracy and indicate that our algorithm achieves better results than non-publicly available state-of-the-art algorithms.

show abstract

Section: Discussionmentioning

confidence: 51%

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Robust and Collective Entity Disambiguation through Semantic Embeddings

Zwicklbauer

Seifert

Granitzer

2016

Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

108

View full text Add to dashboard Cite

show abstract

Impact of OCR Quality on Named Entity Linking

Pontes

Hamdi

Sidère

et al. 2019

Digital Libraries at the Crossroads of Digital Information for the Future

View full text Add to dashboard Cite

Digital libraries are online collections of digital objects that can include text, images, audio, or videos. It has long been observed that named entities (NEs) are key to the access to digital library portals as they are contained in most user queries. Combined or subsequent to the recognition of NEs, named entity linking (NEL) connects NEs to external knowledge bases. This allows to differentiate ambiguous geographical locations or names (John Smith), and implies that the descriptions from the knowledge bases can be used for semantic enrichment. However, the NEL task is especially challenging for large quantities of documents as the diversity of NEs is increasing with the size of the collections. Additionally digitized documents are indexed through their OCRed version which may contains numerous OCR errors. This paper aims to evaluate the performance of named entity linking over digitized documents with different levels of OCR quality. It is the first investigation that we know of to analyze and correlate the impact of document degradation on the performance of NEL. We tested state-of-the-art NEL techniques over several evaluation benchmarks, and experimented with various types of OCR noise. We present the resulting study and subsequent recommendations on the adequate documents and OCR quality levels required to perform reliable named entity linking. We further provide the first evaluation benchmark for NEL over degraded documents.

show abstract