2017
DOI: 10.1515/pralin-2017-0014
|View full text |Cite
|
Sign up to set email alerts
|

Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation

Abstract: We compare three approaches to statistical machine translation (pure phrase-based, factored phrase-based and neural) by performing a fine-grained manual evaluation via error annotation of the systems' outputs. The error types in our annotation are compliant with the multidimensional quality metrics (MQM), and the annotation is performed by two annotators. Inter-annotator agreement is high for such a task, and results show that the best performing system (neural) reduces the errors produced by the worst system … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
38
1

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
2

Relationship

3
6

Authors

Journals

citations
Cited by 64 publications
(44 citation statements)
references
References 10 publications
4
38
1
Order By: Relevance
“…This paper builds upon our recent work on this topic (Klubička et al, 2017), which is here extended in a number of directions:…”
Section: Introductionmentioning
confidence: 99%
“…This paper builds upon our recent work on this topic (Klubička et al, 2017), which is here extended in a number of directions:…”
Section: Introductionmentioning
confidence: 99%
“…Error analysis of NMT systems has also been on the radar of the MT field. Several papers have carried out automatic (Bentivogli et al 2016;Toral and Sánchez-Cartagena 2017) or human error annotation (Burchardt et al 2017;Klubička et al 2017;Popović 2017;Castilho et al 2018) in order to compare phrase-based and neural approaches for different language pairs and domains. In this issue, Calixto and Liu present an extensive error analysis of several MT systems, including two text-only systems that fall into the PBSMT and NMT paradigms, and a set of multimodal NMT models which use not only text but also visual information extracted from images.…”
Section: Error Analysismentioning
confidence: 99%
“…by Bentivogli et al (2016); Toral Ruiz and Sánchez-Cartagena (2017); Costajussà (2017); Klubička et al (2017). These works differ in the language pairs and in the error typology considered.…”
Section: Related Work: Evaluating Morphologymentioning
confidence: 99%
“…These works differ in the language pairs and in the error typology considered. Bentivogli et al (2016) only recognizes three main error types which are automatically recognized based on aligning the hypotheses and references -for instance a morphological error is detected when the word form is wrong, whereas the lemma is correct; this definition is also adopted in , and decomposed at the level of morphological features in ; (Klubička et al, 2017) Table 11: Sentence group evaluation for English-to-Latvian with Entropy (C-set).…”
Section: Related Work: Evaluating Morphologymentioning
confidence: 99%