2018
DOI: 10.1007/s10590-018-9214-x
|View full text |Cite
|
Sign up to set email alerts
|

Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian

Abstract: This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-toCroatian, a language direction that involves translating into a morphologic… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
43
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 35 publications
(51 citation statements)
references
References 19 publications
6
43
0
Order By: Relevance
“…Previous studies on NMT quality reported omissions and mistranslations/incorrect lexis as problematic NMT error categories, e.g. [13,[17][18][19], and a similar picture is painted in our analysis of Figure 19. Interaction between log-transformed first fixation durations and Error Type for eye-key span (the time between first visual contact with a target word and the first keystroke that contributes to the correction of this token).…”
Section: Discussionsupporting
confidence: 79%
See 2 more Smart Citations
“…Previous studies on NMT quality reported omissions and mistranslations/incorrect lexis as problematic NMT error categories, e.g. [13,[17][18][19], and a similar picture is painted in our analysis of Figure 19. Interaction between log-transformed first fixation durations and Error Type for eye-key span (the time between first visual contact with a target word and the first keystroke that contributes to the correction of this token).…”
Section: Discussionsupporting
confidence: 79%
“…Previous studies on NMT quality reported omissions and mistranslations/incorrect lexis as problematic NMT error categories, e.g., [13,[17][18][19], and a similar picture is painted in our analysis of the DGT corpus, where lexical errors, and particularly stylistic/register errors, function words, mistranslations, and terminology errors are the most common error types-both in the machine translation output and the post-edits, which might be explained by the priming effect of machine translations. The most prominent error categories, i.e., mistranslations, terminology errors, function words and stylistic/register errors, were further analysed in a key-logging and eye-tracking experiment to gain more insights regarding their effect on PE effort indicators such as the eye-tracking measures.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The advantages of NMT and its challenges have been investigated from different angles in recent work (Koehn and Knowles, 2017;Toral and Sánchez-Cartagena, 2017;Farajian et al, 2017;Macketanz et al, 2017;Castilho et al, 2017a;Klubička et al, 2017;Bentivogli et al, 2018;Isabelle et al, 2017;Popović, 2017;Forcada, 2017;Castilho et al, 2017b;Junczys-Dowmunt et al, 2016;Klubička et al, 2018;Shterionov et al, 2017;Burchardt et al, 2017, inter alia), however, studies on interactive NMT, especially user studies involving human post-edits of NMT outputs, have so far not been presented.…”
Section: Related Workmentioning
confidence: 99%
“…Ambiguous words are often difficult to translate automatically, even by the current state-of-the-art neural machine (NMT) systems. Whereas NMT systems produce more fluent (grammatical and natural) translations than the previous state-of-theart statistical phrase-based (PBMT) models, the semantic faithfulness of the translation to the original (adequacy) is still often problematic (Castilho et al, 2017;Klubička et al, 2018). Adequacy is even more problematic for ambiguous words which have two or more meanings depending on the context.…”
Section: Introductionmentioning
confidence: 99%