Understanding the Impact of UGC Specificities on Translation Quality

Núñez, José Carlos Rosales; Seddah, Djamé; Wisniewski, Guillaume

doi:10.18653/v1/2021.wnut-1.22

Cited by 1 publication

(2 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This indicates that GPT4 outputs are more surfacically different from the reference translations, which could be a result of paraphrasing or non-standard translations rather than a reflection of MT quality, especially given the high scores by COMET. This confirms that BLEU is poorly adapted to evaluating MT robustness and could even lead to misleading conclusions, confirming previous conclusions drawn by Rosales Núñez et al (2021) about the inadequacy of BLEU for the evaluation of UGC MT. On the other hand, COMET-QE scores show more similar trends to COMET, suggesting that it could be possible to use it to evaluate without having to produce reference translations.…”

Section: Automatic Evaluationsupporting

confidence: 88%

See 1 more Smart Citation

RoCS-MT: Robustness Challenge Set for Machine Translation

Bawden,

Sagot

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

RoCS-MT, a Robust Challenge Set for Machine Translation (MT), is designed to test MT systems' ability to translate user-generated content (UGC) that displays non-standard characteristics, such as spelling errors, devowelling, acronymisation, etc. RoCS-MT is composed of English comments from Reddit, selected for their non-standard nature, which have been manually normalised and professionally translated into five languages: French, German, Czech, Ukrainian and Russian. In the context of the WMT23 test suite shared task, we analyse the models submitted to the general MT task for all from-English language pairs, offering some insights into the types of problems faced by state-of-the-art MT models when dealing with non-standard UGC texts. We compare automatic metrics for MT quality, including quality estimation to see if the same conclusions can be drawn without references. In terms of robustness, we find that many of the systems struggle with non-standard variants of words (e.g. due to phonetically inspired spellings, contraction, truncations, etc.), but that this depends on the system and the amount of training data, with the best overall systems performing better across all phenomena. GPT4 is the clear frontrunner. However we caution against drawing conclusions about generalisation capacity as it and other systems could be trained on the source side of RoCS and also on similar data.

show abstract

Section: Automatic Evaluationsupporting

confidence: 88%

“…They show that this leads to a higher level of non-standard language, although the method is by nature more biased towards the keywords and phenomena used for data selection. An error analysis of the dataset was conducted in (Rosales Núñez et al, 2021), showing MT quality (using BLEU) for different UGC phenomena.…”

Section: Related Workmentioning

confidence: 99%

RoCS-MT: Robustness Challenge Set for Machine Translation

Bawden,

Sagot

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

show abstract

Understanding the Impact of UGC Specificities on Translation Quality

Cited by 1 publication

References 7 publications

RoCS-MT: Robustness Challenge Set for Machine Translation

RoCS-MT: Robustness Challenge Set for Machine Translation

Contact Info

Product

Resources

About