A Simple Recipe for Multilingual Grammatical Error Correction

Rothe, Sascha; Mallinson, Jonathan; Malmi, Eric; Krause, Sebastian; Severyn, Aliaksei

doi:10.18653/v1/2021.acl-short.89

Cited by 68 publications

(103 citation statements)

References 16 publications

Supporting

Mentioning

102

Contrasting

Unclassified

Order By: Relevance

“…A limitation of our work is that we had only a single moderate GPU at our disposal. Scaling model size [105], incorporating additional datasets [46], and training longer can improve accuracy by several percent. Similarly, one can build a model of multiple languages to gain benefits by overlapping vocabularies and semantics of related under-represented languages, although studies report contradictory results [48,46].…”

Section: Discussionmentioning

confidence: 99%

“…The popular seq2seq transformer T5 [99] used batch size 128 for both pre-training and fine-tuning. Follow-up models such as the multilingual version mT5 [100], the grammatical error correction model gT5 [105], and ByT5 [8] (the one we use in this work) all carried on with the same value for fine-tuning. The same size is also used in works solving the diacritics restoration task [47,106].…”

Section: Batch Sizementioning

confidence: 99%

“…However, due to the higher "energy" (or "temperature") in the optimization the high η causes "bouncing" of parameter values and prevents settling in the best spot, resulting in the higher final training loss. An optimal learning rate value, as used during fine-tuning of the T5 family models [99,105,100,8] with Adafactor optimizer, is 0.001.…”

Section: Learning Ratementioning

confidence: 99%

See 2 more Smart Citations

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,

Lukoševičius,

Kapočiūtė-Dzikienė

et al. 2022

Preprint

View full text Add to dashboard Cite

Due to the fast pace of life and online communications, the prevalence of English and the QWERTY keyboard, people tend to forgo using diacritics, make typographical errors (typos) when typing. Restoring diacritics and correcting spelling is important for proper language use and disambiguation of texts for both humans and downstream algorithms. However, both of these problems are typically addressed separately, i.e., state-of-the-art diacritics restoration methods do not tolerate other typos. In this work, we tackle both problems at once by employing newly-developed ByT5 byte-level transformer models. Our simultaneous diacritics restoration and typos correction approach demonstrates near state-of-the-art performance in 13 languages, reaching >96% of the alpha-word accuracy. We also perform diacritics restoration alone on 12 benchmark datasets with the additional one for the Lithuanian language. The experimental investigation proves that our approach is able to achieve comparable results (>98%) to previously reported despite being trained on fewer data. Our approach is also able to restore diacritics in words not seen during training with >76% accuracy. We also show the accuracies to further improve with longer training. All this shows a great real-world application potential of our suggested methods to more data, languages, and error classes.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Batch Sizementioning

confidence: 99%

Section: Learning Ratementioning

confidence: 99%

See 1 more Smart Citation

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,

Lukoševičius,

Kapočiūtė-Dzikienė

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Grundkiewicz et al (2019) approach GEC as a neural machine translation task using the Transformer architecture (Vaswani et al, 2017), which is pre-trained using a vast amount of synthetic data generated by character-level and word-level edits. Recently, Rothe et al (2021) presented a GEC system based on multilingual mT5 (Xue et al, 2021b), reaching state-of-the-art results on several datasets with the gigantic xxl model size with 13B parameters.…”

Section: Related Workmentioning

confidence: 99%

“…For lexical normalization, we could directly use the unnormalized sentence as input and the normalized sentence as output (an approach used by Rothe et al (2021) for GEC). However, we were concerned that such an approach would be too different from the ByT5 pre-training, and furthermore, it would not allow to reconstruct the alignment of the normalized tokens when a word is removed during normalization or split into several words.…”

Section: Input and Output Formatmentioning

confidence: 99%

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

Samuel¹,

Straka²

2021

Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021)

View full text Add to dashboard Cite

We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexicalnormalization systems on 12 social media datasets in 11 languages. We base our solution on a pre-trained byte-level language model, ByT5 (Xue et al., 2021a), which we further pre-train on synthetic data and then fine-tune on authentic normalization data. Our system achieves the best performance by a wide margin in intrinsic evaluation, and also the best performance in extrinsic evaluation through dependency parsing. The source code is released at https://github.com/ufal/ multilexnorm2021 and the fine-tuned models at https://huggingface.co/ufal.

show abstract

Towards Lithuanian Grammatical Error Correction

Stankevičius

Lukoševičius

2022

Artificial Intelligence Trends in Systems

View full text Add to dashboard Cite

A Simple Recipe for Multilingual Grammatical Error Correction

Cited by 68 publications

References 16 publications

Correcting diacritics and typos with a ByT5 transformer model

Correcting diacritics and typos with a ByT5 transformer model

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

Towards Lithuanian Grammatical Error Correction

Contact Info

Product

Resources

About