2020
DOI: 10.5281/zenodo.4117566
|View full text |Cite
|
Sign up to set email alerts
|

Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 4 publications
0
9
0
Order By: Relevance
“…They found remarkable performance differences between noisy and non-noisy mentions, and that already as little noise as 0.1 severely hurts systems' abilities to predict an entity and may halve their performances. To sum up, whether focused on a single OCR version of text(s) [195], on different artificially-generated ones [79], or on the noise present in entities themselves [53], these studies clearly demonstrate how challenging OCR noise is for NER systems.…”
Section: Character Recognitionmentioning
confidence: 93%
See 4 more Smart Citations
“…They found remarkable performance differences between noisy and non-noisy mentions, and that already as little noise as 0.1 severely hurts systems' abilities to predict an entity and may halve their performances. To sum up, whether focused on a single OCR version of text(s) [195], on different artificially-generated ones [79], or on the noise present in entities themselves [53], these studies clearly demonstrate how challenging OCR noise is for NER systems.…”
Section: Character Recognitionmentioning
confidence: 93%
“…Focusing specifically on entity processing, Hamdi et al [79,80] confronted a BiLSTM-based NER model with OCR outputs of the same text but of different qualities and observed a 30 percentage point loss in F-score when the character error rate increased from 7% to 20%. Finally, in order to assess the impact of noisy entities on NER during the CLEF-HIPE-2020 NE evaluation campaign on historical newspapers (HIPE-2020 for short), 7 Ehrmann et al [53] evaluated systems' performances on various entity noise levels, defined as the length-normalised Levenshtein distance between the OCR surface form of an entity and its manual transcription. They found remarkable performance differences between noisy and non-noisy mentions, and that already as little noise as 0.1 severely hurts systems' abilities to predict an entity and may halve their performances.…”
Section: Character Recognitionmentioning
confidence: 99%
See 3 more Smart Citations