Proceedings of the 10th SIGHUM Workshop on Language Technology For Cultural Heritage, Social Sciences, and Humanities 2016
DOI: 10.18653/v1/w16-2106
|View full text |Cite
|
Sign up to set email alerts
|

Dealing with word-internal modification and spelling variation in data-driven lemmatization

Abstract: This paper describes our contribution to two challenges in data-driven lemmatization. We approach lemmatization in the framework of a two-stage process, where first lemma candidates are generated and afterwards a ranker chooses the most probable lemma from these candidates. The first challenge is that languages with rich morphology like Modern German can feature morphological changes of different kinds, in particular word-internal modification. This makes the generation of the correct lemma a harder task than … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…This approach has been followed by Kestemont et al (2010) and Barteld et al (2016) for lemmatization of historical texts.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…This approach has been followed by Kestemont et al (2010) and Barteld et al (2016) for lemmatization of historical texts.…”
Section: Related Workmentioning
confidence: 99%
“…The candidate generation used in the original approach cannot generate the correct candidate for spelling variants of non-standard words that did not appear in the training data, e.g., you are as a normalization candidate for urr will not be generated if only ur as a non-standard variant of you are is known from the training data. Similarly to the approach that Barteld et al (2016) used to improve the lemmatization of non-standard texts, the knowledge that urr is a spelling variant of ur could be used to generate the candidate you are and thereby improve the coverage of the generator.…”
Section: Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Baron et al (2009)). However, the derived mappings themselves that map historical to modern word forms are usually not in the focus of interest (but see Barteld et al (2016)). …”
Section: Related Workmentioning
confidence: 99%