2009
DOI: 10.1007/s10590-009-9065-6
|View full text |Cite
|
Sign up to set email alerts
|

The NIST 2008 Metrics for machine translation challenge—overview, methodology, metrics, and results

Abstract: This paper discusses the evaluation of automated metrics developed for the purpose of evaluating machine translation (MT) technology. A general discussion of the usefulness of automated metrics is offered. The NIST MetricsMATR evaluation of MT metrology is described, including its objectives, protocols, participants, and test data. The methodology employed to evaluate the submitted metrics is reviewed. A summary is provided for the general classes of evaluated metrics. Overall results of this evaluation are pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 41 publications
(26 citation statements)
references
References 21 publications
0
23
0
Order By: Relevance
“…For example, Przybocki et al (2009) use, as part of their larger human evaluation, a single (7-point) scale (labeled "adequacy") to assess the quality of translations. Inter-annotator agreement for this method was κ = 0.25, even lower than the results for adequacy and fluency reported in WMT 2007 (noting that caution is required when directly comparing agreement measurements, especially over scales of varying granularity, such as 5-versus 7-point assessments).…”
Section: Past and Current Methodologiesmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, Przybocki et al (2009) use, as part of their larger human evaluation, a single (7-point) scale (labeled "adequacy") to assess the quality of translations. Inter-annotator agreement for this method was κ = 0.25, even lower than the results for adequacy and fluency reported in WMT 2007 (noting that caution is required when directly comparing agreement measurements, especially over scales of varying granularity, such as 5-versus 7-point assessments).…”
Section: Past and Current Methodologiesmentioning
confidence: 99%
“…However, given the extent to which accurate human assessment of translation quality is fundamental to empirical MT, the underlying topic of finding ways of increasing the reliability of those assessments to date has received surprisingly little attention (Callison-Burch et al, 2007, 2008Przybocki, Peterson, Bronsart, and Sanders, 2009;, 2010Denkowski and Lavie, 2010).…”
Section: Validation Of Automatic Metricsmentioning
confidence: 99%
“…This assumption is also true of most of the 39 automated measures submitted to the NIST 2008 Metrics for Machine Translation Challenge (Przybocki et al 2009). Measures based on exact matching of system outputs to references, including the Word Error Rate (WER) measure used to score automatic speech recognition (ASR), are at a disadvantage when applied to data that contains much variation which is unrelated to translation quality.…”
Section: Challenges For Automated Metricsmentioning
confidence: 95%
“…Communication patterns in this domain have not been studied in significant detail either. The metrics used to assess MT quality in competitive evaluations (Przybocki et al 2009) and the industry (Roturier 2009) also appear to overlook the collaborative nature of the task.…”
Section: Supporting Collaboration Between Translators and Across Sitesmentioning
confidence: 99%
“…930-933). Automatic metrics are tested in terms of their correlation with such judgements (Przybocki et al 2009). …”
Section: Introductionmentioning
confidence: 99%