“…Over the years, researchers have proposed normalization methods based on rules and/or edit distances (Baron and Rayson, 2008;Bollmann, 2012;Hauser and Schulz, 2007;Bollmann et al, 2011;Pettersson et al, 2013a;Mitankin et al, 2014;Pettersson et al, 2014), statistical machine translation (Pettersson et al, 2013b;Scherrer and Erjavec, 2013), and most recently neural network models (Bollmann and Søgaard, 2016;Bollmann et al, 2017;Korchagina, 2017). However, most of these systems have been developed and tested on a single language (or even a single corpus), and many have not been compared to the naïve but strong baseline that only changes words seen in the training data, normalizing each to its most frequent modern form observed during training.…”