Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2341
|View full text |Cite
|
Sign up to set email alerts
|

chrF deconstructed: beta parameters and n-gram weights

Abstract: Character n-gram F-score (CHRF) is shown to correlate very well with human rankings of different machine translation outputs, especially for morphologically rich target languages. However, only two versions have been explored so far, namely CHRF1 (standard F-score, β = 1) and CHRF3 (β = 3), both with uniform n-gram weights. In this work, we investigated CHRF in more details, namely β parameters in range from 1/6 to 6, and we found out that CHRF2 is the most promising version. Then we investigated different n-g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 40 publications
(34 citation statements)
references
References 3 publications
0
34
0
Order By: Relevance
“…3 Motivation for adding word n-grams to CHRF A preliminary experiment on a small set of texts reported in previous work (Popović, 2016) with different target languages and different types of DA 1 shown that for poorly rated sentences, the standard deviations of CHRF and WORDF scores are similar -both metrics assign relatively similar (low) scores. On the other hand, for the sentences with higher human rates, the deviations for CHRF are (much) lower.…”
Section: N-gram Based F-scoresmentioning
confidence: 99%
See 1 more Smart Citation
“…3 Motivation for adding word n-grams to CHRF A preliminary experiment on a small set of texts reported in previous work (Popović, 2016) with different target languages and different types of DA 1 shown that for poorly rated sentences, the standard deviations of CHRF and WORDF scores are similar -both metrics assign relatively similar (low) scores. On the other hand, for the sentences with higher human rates, the deviations for CHRF are (much) lower.…”
Section: N-gram Based F-scoresmentioning
confidence: 99%
“…Contrary to RR, the relation between CHRF and DA has still not been investigated systematically. Preliminary experiments in previous work (Popović, 2016) shown that, concerning DA, the main advantage of character-based Fscore CHRF in comparison to word-based F-score WORDF is better correlation for good translations for which WORDF often assigns too low scores.…”
Section: Introductionmentioning
confidence: 99%
“…A combination of BLEU (Papineni et al, 2002) and word error rate (WER) (Nießen et al, 2000) is used for tuning the system, because tuning on BLEU only resulted in overly long translations. Performance of all systems are reported in terms of BLEU, character F1-score CHRF1 (Popović, 2016) and WER. Statistical significance tests were conducted using approximate randomization tests (Clark et al, 2011).…”
Section: Systemsmentioning
confidence: 99%
“…The systems have been tuned towards characterF-1.0 (Popovic, 2015(Popovic, , 2016. We optimize the beam search parameters, using a grid search.…”
Section: Training Detailsmentioning
confidence: 99%