PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Fujii, Ryo; Mita, Masato; Abe, Kaori; Hanawa, Kazuaki; Morishita, Makoto; Suzuki, Jun; Inui, Kentaro

doi:10.18653/v1/2020.coling-main.521

Cited by 4 publications

(3 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In line to Fujii et al (2020) findings, where finegrained NMT granularity provided robustness advantages when processing misspelling, our results show that the best and worst translations' specificities distribution point to a better performance of char2char for the missing diacritics category, giving insights on more specific types of misspellings that affect performance.…”

Section: Discussionsupporting

confidence: 87%

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

Núñez¹,

Wisniewski²,

Seddah³

2021

Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021)

View full text Add to dashboard Cite

This work explores the capacities of characterbased Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time. Within a strict zeroshot scenario, we first study the detrimental impact on translation performance of various user-generated content phenomena on a small annotated dataset we developed, and then show that such models are indeed incapable of handling unknown letters, which leads to catastrophic translation failure once such characters are encountered. We further confirm this behavior with a simple, yet insightful, copy task experiment and highlight the importance of reducing the vocabulary size hyper-parameter to increase the robustness of character-based models for machine translation.

show abstract

Section: Discussionsupporting

confidence: 87%

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

Núñez¹,

Wisniewski²,

Seddah³

2021

Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021)

View full text Add to dashboard Cite

show abstract

“…Several parallel UGC datasets exist across different language pairs. While some are extracted automatically from crawled data (Ling et al, 2013;Vicente et al, 2016;Mubarak et al, 2020), a majority are based on monolingual sentences that are then translated into the target language (Sluyter-Gäthje et al, 2018;Michel and Neubig, 2018;Rosales Núñez et al, 2019;Fujii et al, 2020;McNamee and Duh, 2022). The closest to our RoCS-MT dataset are (Michel and Neubig, 2018) and (Rosales Núñez et al, 2019), which were designed to contain challenging non-standard phenomena, whereas many of the existing datasets do not apply any such filter.…”

Section: Related Workmentioning

confidence: 99%

RoCS-MT: Robustness Challenge Set for Machine Translation

Bawden,

Sagot

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

RoCS-MT, a Robust Challenge Set for Machine Translation (MT), is designed to test MT systems' ability to translate user-generated content (UGC) that displays non-standard characteristics, such as spelling errors, devowelling, acronymisation, etc. RoCS-MT is composed of English comments from Reddit, selected for their non-standard nature, which have been manually normalised and professionally translated into five languages: French, German, Czech, Ukrainian and Russian. In the context of the WMT23 test suite shared task, we analyse the models submitted to the general MT task for all from-English language pairs, offering some insights into the types of problems faced by state-of-the-art MT models when dealing with non-standard UGC texts. We compare automatic metrics for MT quality, including quality estimation to see if the same conclusions can be drawn without references. In terms of robustness, we find that many of the systems struggle with non-standard variants of words (e.g. due to phonetically inspired spellings, contraction, truncations, etc.), but that this depends on the system and the amount of training data, with the best overall systems performing better across all phenomena. GPT4 is the clear frontrunner. However we caution against drawing conclusions about generalisation capacity as it and other systems could be trained on the source side of RoCS and also on similar data.

show abstract

“…Several UGC parallel corpora (Michel and Neubig, 2018a;Rosales Núñez et al, 2019) have been introduced to evaluate the robustness of MT, some of which, such as (Fujii et al, 2020), are specially annotated to identify UGC idiosyncrasies allowing to measure the impact of a given specificity. Our analyses ( §2), indeed, show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality: explaining the observed performance gap requires a particular evaluation framework made of tailored metrics and specific test sets in which UGC idiosyncrasies have been precisely annotated.…”

Section: Introductionmentioning

confidence: 99%

Understanding the Impact of UGC Specificities on Translation Quality

Núñez¹,

Seddah²,

Wisniewski³

2021

Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021)

View full text Add to dashboard Cite

This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT. Our analyses show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality. That is why we introduce a new data set for the evaluation of UGC translation in which UGC specificities have been manually annotated using a finegrained typology. Using this data set, we conduct several experiments to measure the impact of different kinds of UGC specificities on translation quality, more precisely than previously possible.

show abstract

PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Cited by 4 publications

References 26 publications

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

RoCS-MT: Robustness Challenge Set for Machine Translation

Understanding the Impact of UGC Specificities on Translation Quality

Contact Info

Product

Resources

About