Metric and reference factors in minimum error rate training

He, Yifan; Way, Andy

doi:10.1007/s10590-010-9072-7

Cited by 5 publications

(4 citation statements)

References 10 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This turns out largely due to the fact that the 4-g LM tuned weight for the labeled systems is always far lower than for Hiero, suggesting that the 4-g LM has a smaller contribution during tuning for BLEU. Tuning for BLEU is not guaranteed to give improved performance on all metrics, as noted by He and Way (2009), but we do see here improved performance for three out of four metrics.…”

Section: Primary Results: Soft Bilingual Constraints and Basic+sparsementioning

confidence: 57%

“…it seems that TER is penalizing more heavily longer output even if it is closer in length to the reference (cf. (He and Way 2009)). This turns out largely due to the fact that the 4-g LM tuned weight for the labeled systems is always far lower than for Hiero, suggesting that the 4-g LM has a smaller contribution during tuning for BLEU.…”

Section: Primary Results: Soft Bilingual Constraints and Basic+sparsementioning

confidence: 99%

See 1 more Smart Citation

Labeling hierarchical phrase-based models without linguistic resources

Wenniger

Sima’an

2015

Machine Translation

View full text Add to dashboard Cite

Long-range word order differences are a well-known problem for machine translation. Unlike the standard phrase-based models which work with sequential and local phrase reordering, the hierarchical phrase-based model (Hiero) embeds the reordering of phrases within pairs of lexicalized context-free rules. This allows the model to handle long range reordering recursively. However, the Hiero grammar works with a single nonterminal label, which means that the rules are combined together into derivations independently and without reference to context outside the rules themselves. Follow-up work explored remedies involving nonterminal labels obtained from monolingual parsers and taggers. As of yet, no labeling mechanisms exist for the many languages for which there are no good quality parsers or taggers. In this paper we contribute a novel approach for acquiring reordering labels for Hiero grammars directly from the word-aligned parallel training corpus, without use of any taggers or parsers. The new labels represent types of alignment patterns in which a phrase pair is embedded within larger phrase pairs. In order to obtain alignment patterns that generalize well, we propose to decompose word alignments into trees over phrase pairs. Beside this labeling approach, we contribute coarse and sparse features for learning soft, weighted label-substitution as opposed to standard substitution. We report extensive experiments comparing our model to two baselines: Hiero and the known syntax augmented machine translation (SAMT) variant, which labels Hiero rules with nonterminals extracted from monolingual syntactic parses. We also test a simplified labeling scheme based on inversion transduction grammar (ITG English task we obtain performance improvement up to 1 BLEU point, whereas for the German-English task, where morphology is an issue, a minor (but statistically significant) improvement of 0.2 BLEU points is reported over SAMT. While ITG labeling does give a performance improvement, it remains sometimes suboptimal relative to our proposed labeling scheme.

show abstract

Section: Primary Results: Soft Bilingual Constraints and Basic+sparsementioning

confidence: 57%

Section: Primary Results: Soft Bilingual Constraints and Basic+sparsementioning

confidence: 99%

Labeling hierarchical phrase-based models without linguistic resources

Wenniger

Sima’an

2015

Machine Translation

View full text Add to dashboard Cite

show abstract

“…In addition, we find that both DTU and our systems do not achieve consistent improvements over Treelet in terms of TER. We observed that both DTU and our systems tend to produce longer translations than Treelet, which might cause unreliable TER evaluation in our experiments as TER favours shorter sentences (He and Way, 2010).…”

Section: Resultsmentioning

confidence: 80%

Graph-Based Translation Via Graph Segmentation

Li¹,

Way²,

Li³

2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

In this paper, we present an improved graph-based translation model which segments an input graph into node-induced subgraphs by taking source context into consideration. Translations are generated by combining subgraph translations left-to-right using beam search. Experiments on Chinese-English and German-English demonstrate that the context-aware segmen-tation significantly improves the baseline graph-based model.

show abstract

“…Our work can be seen as replacing the regular BLEU metric with a new paraphrase BLEU metric for system tuning. Different alternative automatic evaluation metric have also been considered for system tuning (He and Way, 2010;Servan and Schwenk, 2011) with Minimum Error Rate Training, MERT (Och, 2003). This work showed some specific cases where Translation Error Rate (TER) was superior to BLEU.…”

Section: Related Workmentioning

confidence: 99%

Human-Paraphrased References Improve Neural Machine Translation

Freitag,

Foster,

Grangier

et al. 2020

Preprint

View full text Add to dashboard Cite

Automatic evaluation comparing candidate translations to human-generated paraphrases of reference translations has recently been proposed by . When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment. This effect holds for a variety of different automatic metrics, and tends to favor natural formulations over more literal (translationese) ones. In this paper we compare the results of performing endto-end system development using standard and paraphrased references. With state-of-the-art English-German NMT components, we show that tuning to paraphrased references produces a system that is significantly better according to human judgment, but 5 BLEU points worse when tested on standard references. Our work confirms the finding that paraphrased references yield metric scores that correlate better with human judgment, and demonstrates for the first time that using these scores for system development can lead to significant improvements.

show abstract

Metric and reference factors in minimum error rate training

Cited by 5 publications

References 10 publications

Labeling hierarchical phrase-based models without linguistic resources

Labeling hierarchical phrase-based models without linguistic resources

Graph-Based Translation Via Graph Segmentation

Human-Paraphrased References Improve Neural Machine Translation

Contact Info

Product

Resources

About