Evaluating Historical Text Normalization Systems: How Well Do They Generalize?

Robertson, Alexander; Goldwater, Sharon

doi:10.18653/v1/n18-2113

Cited by 12 publications

(17 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Normalization. Another approach to historical texts and social media is spelling normalization (e.g., Baron and Rayson, 2008;Han et al, 2012), which has been shown to offer improvements in tagging historical texts (Robertson and Goldwater, 2018). In Early Modern English, Yang and Eisenstein (2016) found that domain adaptation and normalization are complementary.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

Han¹,

Eisenstein²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

143

137

View full text Add to dashboard Cite

Contextualized word embeddings such as ELMo and BERT provide a foundation for strong performance across a wide range of natural language processing tasks by pretraining on large corpora of unlabeled text. However, the applicability of this approach is unknown when the target domain varies substantially from the pretraining corpus. We are specifically interested in the scenario in which labeled data is available in only a canonical source domain such as newstext, and the target domain is distinct from both the labeled and pretraining texts. To address this scenario, we propose domain-adaptive finetuning, in which the contextualized embeddings are adapted by masked language modeling on text from the target domain. We test this approach on sequence labeling in two challenging domains: Early Modern English and Twitter. Both domains differ substantially from existing pretraining corpora, and domain-adaptive fine-tuning yields substantial improvements over strong BERT baselines, with particularly impressive results on out-ofvocabulary words. We conclude that domainadaptive fine-tuning offers a simple and effective approach for the unsupervised adaptation of sequence labeling to difficult new domains. 1 2 While it might be desirable to completely retrain contextualized word embedding models in the target domain (e.g., , this requires data and computational resources that are often unavailable.

show abstract

Section: Related Workmentioning

confidence: 99%

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

Han¹,

Eisenstein²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

143

137

View full text Add to dashboard Cite

show abstract

“…Recently, morphological tasks such as inflection generation and lemmatization ( Figure 1) have been successfully tackled with neural transitionbased models over edit actions (Aharoni and Goldberg, 2017;Robertson and Goldwater, 2018;Makarov and Clematide, 2018;Cotterell et al, 2017b). The model, introduced in Aharoni and Goldberg (2017), uses familiar inductive biases about morphological string transduction such as conditioning on a single input character and monotonic character-to-character alignment.…”

Section: Introductionmentioning

confidence: 99%

Imitation Learning for Neural Morphological String Transduction

Makarov

Clematide

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization. Previous approaches to training this type of model either rely on an external character aligner for the production of gold action sequences, which results in a suboptimal model due to the unwarranted dependence on a single gold action sequence despite spurious ambiguity, or require warm starting with an MLE model. Our approach only requires a simple expert policy, eliminating the need for a character aligner or warm start. It also addresses familiar MLE training biases and leads to strong and state-of-the-art performance on several benchmarks. 2

show abstract

“…Numerous neural approaches to text normalization (Tang et al, 2018;Lusetti et al, 2018;Robertson and Goldwater, 2018;Bollmann et al, 2017;Korchagina, 2017) learn a discriminative model p(y | x)parameterized with some generic encoder-decoder neural network-that performs the traditional character-level transduction of isolated words. The models are trained in a supervised fashion on a lot of manually labeled data.…”

Section: Historical Text Normalizationmentioning

confidence: 99%