2014
DOI: 10.1007/978-3-319-14120-6_14
|View full text |Cite
|
Sign up to set email alerts
|

Applying Rule-Based Normalization to Different Types of Historical Texts—An Evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 2 publications
0
6
0
Order By: Relevance
“…Over the years, researchers have proposed normalization methods based on rules and/or edit distances (Baron and Rayson, 2008;Bollmann, 2012;Hauser and Schulz, 2007;Bollmann et al, 2011;Pettersson et al, 2013a;Mitankin et al, 2014;Pettersson et al, 2014), statistical machine translation (Pettersson et al, 2013b;Scherrer and Erjavec, 2013), and most recently neural network models (Bollmann and Søgaard, 2016;Bollmann et al, 2017;Korchagina, 2017). However, most of these systems have been developed and tested on a single language (or even a single corpus), and many have not been compared to the naïve but strong baseline that only changes words seen in the training data, normalizing each to its most frequent modern form observed during training.…”
Section: Introductionmentioning
confidence: 99%
“…Over the years, researchers have proposed normalization methods based on rules and/or edit distances (Baron and Rayson, 2008;Bollmann, 2012;Hauser and Schulz, 2007;Bollmann et al, 2011;Pettersson et al, 2013a;Mitankin et al, 2014;Pettersson et al, 2014), statistical machine translation (Pettersson et al, 2013b;Scherrer and Erjavec, 2013), and most recently neural network models (Bollmann and Søgaard, 2016;Bollmann et al, 2017;Korchagina, 2017). However, most of these systems have been developed and tested on a single language (or even a single corpus), and many have not been compared to the naïve but strong baseline that only changes words seen in the training data, normalizing each to its most frequent modern form observed during training.…”
Section: Introductionmentioning
confidence: 99%
“…This approach works well. However, the same experiments showed lower performance on unknown wordforms [3], specially in some periods (time in history) the rules were varying. The rulebased approach was widely used in Information Retrieval applied for normalizing historical language data [6,1,19,12].…”
Section: Introductionmentioning
confidence: 64%
“…The goal of this research is to provide an automatic mapping from wordforms from Early New High German (14 th -16 th centuries) to the corresponding modern wordforms from New High German, as shown in Figure. 1. Bollman et al [3,4] compared different approaches for normalizing of variant word forms on their modern spelling using string distance measures and evaluate them on two types of historical text Luther bible and Anselm. These approaches are either rule-based or wordlist approaches.…”
Section: Introductionmentioning
confidence: 99%
“…Text standardisation has been applied to historical text in languages such as English [5], French [6], German [7,8], Irish [9], Portuguese [10] and Slovene [11] to name but a few. In early work, researchers tended to adopt rule-based and edit-distancebased methods [12,13,14,15,16,17].…”
Section: Related Workmentioning
confidence: 99%