A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction

Eger, Steffen; Brück, Tim vor der; Mehler, Alexander

doi:10.1515/pralin-2016-0004

Cited by 17 publications

(32 citation statements)

References 26 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The average performance of the perceptron tagger in this experiment is superior to the performance of the AliSeTra system as reported by Eger et al (2016). The difference in performance is, however, not statistically significant.…”

Section: Resultscontrasting

confidence: 48%

“…UC refers to the unstructured classifier presented in Section 3.1, PT to the perceptron tagger presented in Section 3.2 and AliSeTra to the system presented by Eger et al (2016).…”

Section: Resultsmentioning

confidence: 99%

“…Like DirecTL+, it also views string-tostring translation as a pipeline of segmentation and sequence labeling. (4) The final system surveyed by Eger et al (2016) represents the stringto-string translation task as a series of contextual edit operations on the input string (Cotterell et al, 2015). The operations are compiled into a weighted finite-state machine.…”

Section: Related Workmentioning

confidence: 99%

“…Systems 1, 2 and 3 surveyed by Eger et al (2016) form an interesting contrast to our systems because we do not use segmentation of the input string. In this sense, our system is simpler.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

2016

View full text Add to dashboard Cite

Section: Resultscontrasting

confidence: 48%

“…UC refers to the unstructured classifier presented in Section 3.1, PT to the perceptron tagger presented in Section 3.2 and AliSeTra to the system presented by Eger et al (2016).…”

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…Systems 1, 2 and 3 surveyed by Eger et al (2016) form an interesting contrast to our systems because we do not use segmentation of the input string. In this sense, our system is simpler.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

2016

View full text Add to dashboard Cite

“…Note that when all input forms are incorrect (as in the case of the Twitter data), CR corresponds exactly to the evaluation metric word accuracy (WACC) used by Eger et al (2016) because the count f p is 0. WACC = tp tp + f n Tables 2 and 3 show the results of the experiments on the Finnish OCR data and Twitter data.…”

Section: Methodsmentioning

confidence: 99%

Data-Driven Spelling Correction using Weighted Finite-State Methods

Silfverberg¹,

Kauppinen²,

Lindén³

2016

Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

View full text Add to dashboard Cite

This paper presents two systems for spelling correction formulated as a sequence labeling task. One of the systems is an unstructured classifier and the other one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.

show abstract