Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1263
|View full text |Cite
|
Sign up to set email alerts
|

A Challenge Set Approach to Evaluating Machine Translation

Abstract: Neural machine translation represents an exciting leap forward in translation quality. But what longstanding weaknesses does it resolve, and which remain? We address these questions with a challenge set approach to translation evaluation and error analysis. A challenge set consists of a small set of sentences, each hand-designed to probe a system's capacity to bridge a particular structural divergence between languages. To exemplify this approach, we present an English-French challenge set, and use it to analy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
101
0
4

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 106 publications
(116 citation statements)
references
References 12 publications
4
101
0
4
Order By: Relevance
“…This requires attending to two or more regions that can be arbitrarily distant from one another. Several phenomena, such as light verbs (Isabelle and Kuhn, 2018), are known from the linguistic and MT literature to yield lexical LDD. Our methodology takes a predefined set of such phenomena, and defines rules for detecting each of them over dependency parses of the source-side.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This requires attending to two or more regions that can be arbitrarily distant from one another. Several phenomena, such as light verbs (Isabelle and Kuhn, 2018), are known from the linguistic and MT literature to yield lexical LDD. Our methodology takes a predefined set of such phenomena, and defines rules for detecting each of them over dependency parses of the source-side.…”
Section: Methodsmentioning
confidence: 99%
“…With major improvements in system performance, crude assessments of performance are becoming less satisfying, i.e., evaluation metrics do not give an indication on the performance of MT systems on important challenges for the field (Isabelle and Kuhn, 2018). String-similarity metrics against a reference are known to be partial and coarsegrained aspects of the task (Callison-Burch et al, 2006), but are still the common practice in various text generation tasks.…”
Section: Mt Evaluationmentioning
confidence: 99%
“…Another line of work has analyzed the robustness of NLP models both via controlled experiments to complement the information from the test set accuracy and test abilities of the models (Isabelle et al, 2017;B. Hashemi and Hwa, 2016;White et al, 2017) and via adversarial instances to expose weaknesses (Jia and Liang, 2017).…”
Section: Analysis Of Complex Modelsmentioning
confidence: 99%
“…We draw motivation to study the robustness of NLI models from previous work on evaluating complex models (Isabelle et al, 2017;White et al, 2017). Furthermore, we base our approach on the discipline of behavioral science which provides methodologies for analyzing how certain factors influence the behavior of subjects under study (Epling and Pierce, 1986).…”
Section: Introductionmentioning
confidence: 99%
“…in (Bentivogli et al, 2016). Recently, various new proposals have been put forward to better diagnose neural models, notably by Linzen et al (2016); Sennrich (2017), who focus respectively on the syntactic competence of Neural Language Models (NLMs) or of NMT; and by Isabelle et al (2017); Burchardt et al (2017), who resuscitate an old tradition of designing test suites. Inspired by these (and other) works (see § 4), we propose in this paper a new evaluation scheme aimed at specifically assessing the morphological competence of MT engines translating from English into a Morphologically Rich Language (MRL).…”
Section: Introductionmentioning
confidence: 99%