Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.521
|View full text |Cite
|
Sign up to set email alerts
|

PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Abstract: Neural Machine Translation (NMT) has shown drastic improvement in its quality when translating clean input, such as text from the news domain. However, existing studies suggest that NMT still struggles with certain kinds of input with considerable noise, such as User-Generated Contents (UGC) on the Internet. To make better use of NMT for cross-cultural communication, one of the most promising directions is to develop a model that correctly handles these expressions. Though its importance has been recognized, i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 26 publications
1
2
0
Order By: Relevance
“…In line to Fujii et al (2020) findings, where finegrained NMT granularity provided robustness advantages when processing misspelling, our results show that the best and worst translations' specificities distribution point to a better performance of char2char for the missing diacritics category, giving insights on more specific types of misspellings that affect performance.…”
Section: Discussionsupporting
confidence: 87%
“…In line to Fujii et al (2020) findings, where finegrained NMT granularity provided robustness advantages when processing misspelling, our results show that the best and worst translations' specificities distribution point to a better performance of char2char for the missing diacritics category, giving insights on more specific types of misspellings that affect performance.…”
Section: Discussionsupporting
confidence: 87%
“…Several parallel UGC datasets exist across different language pairs. While some are extracted automatically from crawled data (Ling et al, 2013;Vicente et al, 2016;Mubarak et al, 2020), a majority are based on monolingual sentences that are then translated into the target language (Sluyter-Gäthje et al, 2018;Michel and Neubig, 2018;Rosales Núñez et al, 2019;Fujii et al, 2020;McNamee and Duh, 2022). The closest to our RoCS-MT dataset are (Michel and Neubig, 2018) and (Rosales Núñez et al, 2019), which were designed to contain challenging non-standard phenomena, whereas many of the existing datasets do not apply any such filter.…”
Section: Related Workmentioning
confidence: 99%
“…Several UGC parallel corpora (Michel and Neubig, 2018a;Rosales Núñez et al, 2019) have been introduced to evaluate the robustness of MT, some of which, such as (Fujii et al, 2020), are specially annotated to identify UGC idiosyncrasies allowing to measure the impact of a given specificity. Our analyses ( §2), indeed, show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality: explaining the observed performance gap requires a particular evaluation framework made of tailored metrics and specific test sets in which UGC idiosyncrasies have been precisely annotated.…”
Section: Introductionmentioning
confidence: 99%