2020
DOI: 10.1007/978-3-030-45442-5_68
|View full text |Cite
|
Sign up to set email alerts
|

Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers

Abstract: Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 17 publications
0
12
0
Order By: Relevance
“…The HIPE dataset 1 was created by the organisers of the CLEF 2020 Evaluation Lab HIPE challenge [8]. It is composed of articles from several Swiss, Luxembourgish, and American historical newspapers from 1790 to 2010 [9].…”
Section: Hipe Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…The HIPE dataset 1 was created by the organisers of the CLEF 2020 Evaluation Lab HIPE challenge [8]. It is composed of articles from several Swiss, Luxembourgish, and American historical newspapers from 1790 to 2010 [9].…”
Section: Hipe Datasetmentioning
confidence: 99%
“…In order to overcome these problems, we utilised the multilingual end-to-end entity linking (MEL) models described in [18] to process historical documents and disambiguate entities in Finnish, French, German, and Swedish. This system achieved the best results in terms of EL in the CLEF 2020 Evaluation Lab HIPE challenge [8]. To minimise the impact of historical documents on the EL task, this system is composed of modules to overcome problems related to multilingualism and OCR errors.…”
Section: Entity Linkingmentioning
confidence: 99%
“…The HIPE dataset was created by the CLEF 2020 Evaluation Lab HIPE challenge (Ehrmann et al, 2020a). It is composed of articles from several Swiss, Luxembourgish, and American historical newspapers from 1790 to 2010 (Ehrmann et al, 2020b).…”
Section: Datasetsmentioning
confidence: 99%
“…These particularities have then a significant impact on NLP and IR applications over historical documents. To illustrate some of the aforementioned problems, let us consider Figure 1(a) which includes some English documents used in the evaluation campaign CLEF HIPE 2020 [9]. Figure 1(b) and Figure 1(c) are zoomed and cropped portions of most left document presented in Figure 1(a).…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, our EL approach decreases possible bias by not limiting or focusing the explored entities to a specific dataset. We evaluate our methods in two recent historical corpora, CLEF HIPE 2020 [9], and NewsEye datasets, that are composed of documents in English, Finnish, French, German, and Swedish. Our study shows that our techniques improve the performance of EL systems and partially solve the issues of historical data.…”
Section: Introductionmentioning
confidence: 99%