2017
DOI: 10.1080/15420353.2017.1307304
|View full text |Cite
|
Sign up to set email alerts
|

Alts, Abbreviations, and AKAs: Historical Onomastic Variation and Automated Named Entity Recognition

Abstract: The accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardised into single officially recognised forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniqu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 19 publications
(18 citation statements)
references
References 8 publications
(7 reference statements)
0
16
0
1
Order By: Relevance
“…Twitter datasets such as GeoCorpora (Wallgrün et al 2018) experience a gradual decline in completeness as users delete their tweets and deactivate profiles. WoTR DeLozier et al 2016and CLDW Rayson et al (2017) are suitable only for digital humanities due to their historical nature and localised coverage, which is problematic to resolve (Butler et al 2017). CLUST (Lieberman and Samet 2011) is a corpus of clustered streaming news of global events, similar to LGL.…”
Section: Unsuitable Datasetsmentioning
confidence: 99%
“…Twitter datasets such as GeoCorpora (Wallgrün et al 2018) experience a gradual decline in completeness as users delete their tweets and deactivate profiles. WoTR DeLozier et al 2016and CLDW Rayson et al (2017) are suitable only for digital humanities due to their historical nature and localised coverage, which is problematic to resolve (Butler et al 2017). CLUST (Lieberman and Samet 2011) is a corpus of clustered streaming news of global events, similar to LGL.…”
Section: Unsuitable Datasetsmentioning
confidence: 99%
“…In the case of Geograph, extracting sentences is a relatively simple task, since the texts largely consist of short captions for the associated image. The CLDW, on the other hand, presented more challenges, including idiosyncratic punctuation, case and hyphenation (Butler et al 2017. Nonetheless, in both cases sentence extraction and tokenisation were carried out using the NLTK Python Library with no modifications.…”
Section: Finding the Middle Ground 41 Overview Of The Processmentioning
confidence: 99%
“…These coordinate data are allocated through additional XML tags that are added to the texts. Conducting this process in an entirely automated manner was found not to be satisfactory for the complex place-names found in the CLDW (see Butler et al 2017). The process was enhanced using concordance geoparsing -where a small subset of the text is geoparsed, the results are checked, and any corrections fed into processing subsequent subsets (Rupp et al 2014) -and a considerable amount of manual checking.…”
Section: From Text To Gis Databasementioning
confidence: 99%