Abstract:We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines suc… Show more
“…The average performance of the perceptron tagger in this experiment is superior to the performance of the AliSeTra system as reported by Eger et al (2016). The difference in performance is, however, not statistically significant.…”
Section: Resultscontrasting
confidence: 48%
“…UC refers to the unstructured classifier presented in Section 3.1, PT to the perceptron tagger presented in Section 3.2 and AliSeTra to the system presented by Eger et al (2016).…”
Section: Resultsmentioning
confidence: 99%
“…Like DirecTL+, it also views string-tostring translation as a pipeline of segmentation and sequence labeling. (4) The final system surveyed by Eger et al (2016) represents the stringto-string translation task as a series of contextual edit operations on the input string (Cotterell et al, 2015). The operations are compiled into a weighted finite-state machine.…”
Section: Related Workmentioning
confidence: 99%
“…Systems 1, 2 and 3 surveyed by Eger et al (2016) form an interesting contrast to our systems because we do not use segmentation of the input string. In this sense, our system is simpler.…”
“…The average performance of the perceptron tagger in this experiment is superior to the performance of the AliSeTra system as reported by Eger et al (2016). The difference in performance is, however, not statistically significant.…”
Section: Resultscontrasting
confidence: 48%
“…UC refers to the unstructured classifier presented in Section 3.1, PT to the perceptron tagger presented in Section 3.2 and AliSeTra to the system presented by Eger et al (2016).…”
Section: Resultsmentioning
confidence: 99%
“…Like DirecTL+, it also views string-tostring translation as a pipeline of segmentation and sequence labeling. (4) The final system surveyed by Eger et al (2016) represents the stringto-string translation task as a series of contextual edit operations on the input string (Cotterell et al, 2015). The operations are compiled into a weighted finite-state machine.…”
Section: Related Workmentioning
confidence: 99%
“…Systems 1, 2 and 3 surveyed by Eger et al (2016) form an interesting contrast to our systems because we do not use segmentation of the input string. In this sense, our system is simpler.…”
“…Note that when all input forms are incorrect (as in the case of the Twitter data), CR corresponds exactly to the evaluation metric word accuracy (WACC) used by Eger et al (2016) because the count f p is 0. WACC = tp tp + f n Tables 2 and 3 show the results of the experiments on the Finnish OCR data and Twitter data.…”
This paper presents two systems for spelling correction formulated as a sequence labeling task. One of the systems is an unstructured classifier and the other one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.