Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017 2017
DOI: 10.4000/books.aaccademia.2459
|View full text |Cite
|
Sign up to set email alerts
|

Domain-specific Named Entity Disambiguation in Historical Memoirs

Abstract: This paper presents the results of the extraction of named entities from a collection of historical memoirs about the italian Resistance during the World War II. The methodology followed for the extraction and disambiguation task will be discussed, as well as its evaluation. For the semantic annotations of the dataset, we have developed a pipeline based on established practices for extracting and disambiguating Named Entities. This has been necessary, considering the poor performances of out-of-the-box Named E… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 6 publications
0
8
0
Order By: Relevance
“…Even if in principle it might be useful to capture all the components of every entry of every index of interest into the meta-index, in practice only the indexation units will normally be part of the meta-index as mentions of a thing of interest. The task of performing entity/record linkage or named entity disambiguation on historical texts is an open and challenging research area, beyond the intended scope of this paper (Piotrowski, 2012;Olieman et al, 2017;Rovera et al, 2017).…”
Section: Phase 4: Alignmentmentioning
confidence: 99%
“…Even if in principle it might be useful to capture all the components of every entry of every index of interest into the meta-index, in practice only the indexation units will normally be part of the meta-index as mentions of a thing of interest. The task of performing entity/record linkage or named entity disambiguation on historical texts is an open and challenging research area, beyond the intended scope of this paper (Piotrowski, 2012;Olieman et al, 2017;Rovera et al, 2017).…”
Section: Phase 4: Alignmentmentioning
confidence: 99%
“…Furthermore, a significant research effort has been devoted to the task of automatically extracting information about events from texts; see, for instance, [12,27,39]. In particular, several works showed that historical texts represent a peculiar domain, where IE/NER tools show quite low performances compared to other domains; see, for example [7,14,33,36]. We do not analyze the related work within these two fields in depth, since these aspects fall outside the scope of this paper.…”
Section: Related Workmentioning
confidence: 99%
“…While these choices allow fast run-time, they generally rely on the assumption that all surface forms of each entity are present as aliases in the KB. The performances of these systems degrade when dealing with domain-specific vocabulary (Munnelly and Lawless, 2018), local variations (Rovera et al, 2017), historical materials (Olieman et al, 2017; and, in general, challenges that emerge when performing EL on non-standard documents. 2 This subtask of EL, often referred to as candidate ranking (and selection), is mostly ignored when designing downstream systems, even though its significant impact on downstream NLP pipelines has been shown previously (Quercini et al, 2010;Hachey et al, 2013).…”
Section: Introductionmentioning
confidence: 99%