Proceedings of the 10th Linguistic Annotation Workshop Held In Conjunction With ACL 2016 (LAW-X 2016) 2016
DOI: 10.18653/v1/w16-1711
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Inter-Annotator Agreement on Historical Spelling Normalization

Abstract: This paper deals with means of evaluating inter-annotator agreement for a normalization task. This task differs from common annotation tasks in two important aspects: (i) the class of labels (the normalized wordforms) is open, and (ii) annotations can match to different degrees. We propose a new method to measure inter-annotator agreement for the normalization task. It integrates common chancecorrected agreement measures, such as Fleiss's κ or Krippendorff's α. The novelty of our proposed method lies in the wa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…The issues related to normalization and annotation are equally applicable to the use of corpora in historical linguistics, sociolinguistics, dialectology, and, in a somewhat different way, language typology. In historical linguistics, token normalization (Azawi, Afzal, & Breuel, ; Bollmann, Dipper, & Petran, ; Bollmann, Petran, & Dipper, ; Jurish, ), sentence segmentation (Petran, ), and extensions of POS tagsets (Dipper et al., ) are actively discussed, which should support fruitful cross‐disciplinary insight for the analysis of learner corpora.…”
mentioning
confidence: 99%
“…The issues related to normalization and annotation are equally applicable to the use of corpora in historical linguistics, sociolinguistics, dialectology, and, in a somewhat different way, language typology. In historical linguistics, token normalization (Azawi, Afzal, & Breuel, ; Bollmann, Dipper, & Petran, ; Bollmann, Petran, & Dipper, ; Jurish, ), sentence segmentation (Petran, ), and extensions of POS tagsets (Dipper et al., ) are actively discussed, which should support fruitful cross‐disciplinary insight for the analysis of learner corpora.…”
mentioning
confidence: 99%
“…For building block applications, adequate text preprocessing is necessary to leverage these NLP building blocks to their full potential (Thanaki 2017;Sarkar 2019), while improper choices in text preprocessing can hinder their performance (Reber 2019). For example, the accuracy of POS tagging can generally be improved through spelling normalization (Schuur 2020), especially in historical texts where archaic word forms are mapped to modern ones in the POS training database (Bollmann 2013). NER can benefit from the detection of multiword expressions, since an entity often contains more than one word (Tan and Pal 2014;Nayel et al 2019).…”
Section: Nlp Application Typesmentioning
confidence: 99%
“…This shows that punctuation provides grammatical information to POS tagging (Olde et al 1999). Note that inconsistent use of punctuation can be worse than no punctuation (Bollmann 2013), and in this case, discarding punctuation is preferable. Furthermore, using punctuation to separate text into shorter strings is helpful in machine translation, especially for long and complicated sentences (Yin et al 2007).…”
Section: Separating Punctuation From Stringsmentioning
confidence: 99%
“…Bollmann (2013) showed that even a very small amount of training data (250 manually normalised tokens) significantly raises the accuracy of PoS tagging (approximately 46% on a 15th-century German manuscript), indicating that the approach is especially useful for less-resourced language variants and that the process may be quite cost-effective.…”
Section: Introductionmentioning
confidence: 99%