2019
DOI: 10.1007/978-3-030-34058-2_11
|View full text |Cite
|
Sign up to set email alerts
|

Impact of OCR Quality on Named Entity Linking

Abstract: Digital libraries are online collections of digital objects that can include text, images, audio, or videos. It has long been observed that named entities (NEs) are key to the access to digital library portals as they are contained in most user queries. Combined or subsequent to the recognition of NEs, named entity linking (NEL) connects NEs to external knowledge bases. This allows to differentiate ambiguous geographical locations or names (John Smith), and implies that the descriptions from the knowledge base… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 21 publications
(29 citation statements)
references
References 17 publications
0
29
0
Order By: Relevance
“…XEL is a fundamental tool for search engines in digital libraries to retrieve documents where their contents (including named entities) are in different languages and contexts. Linhares Pontes et al [10] showed an analysis of the impact of problems detected in these libraries using Ganea and Hofmann's and Le and Titov's systems. In this analysis, these systems had a small reduction in NEL performance despite the errors caused by the deterioration and conservation problems in libraries.…”
Section: Experimental Assessmentmentioning
confidence: 99%
See 1 more Smart Citation
“…XEL is a fundamental tool for search engines in digital libraries to retrieve documents where their contents (including named entities) are in different languages and contexts. Linhares Pontes et al [10] showed an analysis of the impact of problems detected in these libraries using Ganea and Hofmann's and Le and Titov's systems. In this analysis, these systems had a small reduction in NEL performance despite the errors caused by the deterioration and conservation problems in libraries.…”
Section: Experimental Assessmentmentioning
confidence: 99%
“…These problems cause numerous errors at the character and word levels in the OCR of these documents [10]. Linhares Pontes et al [10] analyzed the impact of OCR quality on the NEL task and achieved satisfying results for NEL. They provided recommendations on the OCR quality that is required for a given level of expected NEL performance.…”
Section: Introductionmentioning
confidence: 99%
“…Other works have concentrated on developing features and rules for improving EL in a specific domain [13] or entity types [30,3,4]. Furthermore, some researchers have investigated the effect of issues frequently found in historical documents on the task of EL [13,20]. Some NER and EL systems dedicated to historical documents have also been explored [16,23,24,28].…”
Section: Entity Linking For Historical Datamentioning
confidence: 99%
“…EL is a challenging task due to the fact that named entities may have multiple surface forms, for instance, in the case of a person an entity can be represented with their full or partial name, alias, honorifics, or alternate spellings [29]. Compared to contemporary data, few works in the state of the art have studied the EL task on historical documents [30,16,3,4,13,23,28] and OCR-processed documents [20].…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, it appears that historical texts poses new challenges to the application of NE processing [11,41], as it does for language technologies in general [47]. First, inputs can be extremely noisy, with errors which do not resemble tweet misspellings or speech transcription hesitations, for which adapted approaches have already been devised [29,7,46]. Second, the language under study is mostly of earlier stage(s), which renders usual external and internal evidences less effective (e.g., the usage of different naming conventions and presence of historical spelling variations) [5,4].…”
Section: Introductionmentioning
confidence: 99%