Learning attention for historical text normalization by learning to pronounce

Bollmann, Marcel; Bingel, Joachim; Søgaard, Anders

doi:10.18653/v1/p17-1031

Cited by 27 publications

(49 citation statements)

References 14 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We expected that the bias toward monotonic alignments would help the hard attention model at smaller data sizes, but it is the soft attention model that seems to do better there, while the hard attention model does better in most cases at the larger data sizes. Note that Bollmann et al (2017) trained their model on individual manuscripts, with no training set containing more than 13.2k tokens. The fact that this model struggles with larger data sizes, especially for seen tokens, suggests that the default hyperparameters may be tuned to work well with small training sets at the cost of underfitting the larger datasets.…”

Section: Results: Normalization Accuracymentioning

confidence: 99%

“…It is therefore critical to report both dataset statistics and experimental results for unseen tokens. Unfortunately, some recent papers have only reported accuracy on all tokens, and only in comparison to other (non-baseline) systems (Bollmann and Søgaard, 2016;Bollmann et al, 2017;Korchagina, 2017). These figures can be misleading if systems underperform the naïve baseline on seen tokens (which we show does happen in practice).…”

Section: Task Setting and Issues Of Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating Historical Text Normalization Systems: How Well Do They Generalize?

Robertson

Goldwater²

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice-i.e., for new datasets or languages; in comparison to more naïve systems; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural models against a naïve baseline system. We show that the neural models generalize well to unseen words in tests on five languages; nevertheless, they provide no clear benefit over the naïve baseline for downstream POS tagging of an English historical collection. We conclude that future work should include more rigorous evaluation, including both intrinsic and extrinsic measures where possible.

show abstract

Section: Results: Normalization Accuracymentioning

confidence: 99%

Section: Task Setting and Issues Of Evaluationmentioning

confidence: 99%

Evaluating Historical Text Normalization Systems: How Well Do They Generalize?

Robertson

Goldwater²

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

show abstract

“…Multi-task learning (MTL) and semi-supervised learning are both successful paradigms for learning in scenarios with limited labelled data and have in recent years been applied to almost all areas of NLP. Applications of MTL in NLP, for example, include partial parsing ), text normalisation (Bollman et al, 2017), neural machine translation (Luong et al, 2016), and keyphrase boundary classification (Augenstein and .…”

Section: Introductionmentioning

confidence: 99%

Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces

Augenstein¹,

Ruder²,

Søgaard³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

Self Cite

View full text Add to dashboard Cite

We combine multi-task learning and semisupervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new stateof-the-art for topic-based sentiment analysis.

show abstract

“…Model We use the same encoder-decoder architecture with attention as described in Bollmann et al (2017). 4 This is a fairly standard model consisting of one bidirectional LSTM unit in the encoder and one (unidirectional) LSTM unit in the decoder.…”

Section: Methodsmentioning

confidence: 99%

Multi-task learning for historical text normalization: Size matters

Bollmann

Søgaard

Bingel

2018

Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

Self Cite

View full text Add to dashboard Cite

Historical text normalization suffers from small datasets that exhibit high variance, and previous work has shown that multitask learning can be used to leverage data from related problems in order to obtain more robust models. Previous work has been limited to datasets from a specific language and a specific historical period, and it is not clear whether results generalize. It therefore remains an open problem, when historical text normalization benefits from multi-task learning. We explore the benefits of multi-task learning across 10 different datasets, representing different languages and periods. Our main findingcontrary to what has been observed for other NLP tasks-is that multi-task learning mainly works when target task data is very scarce.

show abstract

Learning attention for historical text normalization by learning to pronounce

Cited by 27 publications

References 14 publications

Evaluating Historical Text Normalization Systems: How Well Do They Generalize?

Evaluating Historical Text Normalization Systems: How Well Do They Generalize?

Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces

Multi-task learning for historical text normalization: Size matters

Contact Info

Product

Resources

About