“…Multilingual Wikipedia Corpus The Multilingual Wikipedia Corpus (Kawakami, Dyer, and Blunsom 2017) contains 360 Wikipedia articles in English, French, Spanish, German, Russian, Czech, and Finnish. However, we re-tokenize the dataset, not only splitting on spaces (as Kawakami, Dyer, and Blunsom do) but also by splitting off each punctuation symbol as its own token.…”