Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-2113
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Historical Text Normalization Systems: How Well Do They Generalize?

Abstract: We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice-i.e., for new datasets or languages; in comparison to more naïve systems; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural models against a naïve baseline system. We show that the neural models generalize well to unseen words in tests on five l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
16
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 10 publications
1
16
0
Order By: Relevance
“…Normalization. Another approach to historical texts and social media is spelling normalization (e.g., Baron and Rayson, 2008;Han et al, 2012), which has been shown to offer improvements in tagging historical texts (Robertson and Goldwater, 2018). In Early Modern English, Yang and Eisenstein (2016) found that domain adaptation and normalization are complementary.…”
Section: Related Workmentioning
confidence: 99%
“…Normalization. Another approach to historical texts and social media is spelling normalization (e.g., Baron and Rayson, 2008;Han et al, 2012), which has been shown to offer improvements in tagging historical texts (Robertson and Goldwater, 2018). In Early Modern English, Yang and Eisenstein (2016) found that domain adaptation and normalization are complementary.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, morphological tasks such as inflection generation and lemmatization ( Figure 1) have been successfully tackled with neural transitionbased models over edit actions (Aharoni and Goldberg, 2017;Robertson and Goldwater, 2018;Makarov and Clematide, 2018;Cotterell et al, 2017b). The model, introduced in Aharoni and Goldberg (2017), uses familiar inductive biases about morphological string transduction such as conditioning on a single input character and monotonic character-to-character alignment.…”
Section: Introductionmentioning
confidence: 99%
“…Numerous neural approaches to text normalization (Tang et al, 2018;Lusetti et al, 2018;Robertson and Goldwater, 2018;Bollmann et al, 2017;Korchagina, 2017) learn a discriminative model p(y | x)parameterized with some generic encoder-decoder neural network-that performs the traditional character-level transduction of isolated words. The models are trained in a supervised fashion on a lot of manually labeled data.…”
Section: Historical Text Normalizationmentioning
confidence: 99%