Proceedings of the Tenth Workshop on Statistical Machine Translation 2015
DOI: 10.18653/v1/w15-3049
|View full text |Cite
|
Sign up to set email alerts
|

chrF: character n-gram F-score for automatic MT evaluation

Abstract: We propose the use of character n-gram F-score for automatic evaluation of machine translation output. Character ngrams have already been used as a part of more complex metrics, but their individual potential has not been investigated yet. We report system-level correlations with human rankings for 6-gram F1-score (CHRF) on the WMT12, WMT13 and WMT14 data as well as segment-level correlation for 6-gram F1 (CHRF) and F3-scores (CHRF3) on WMT14 data for all available target languages. The results are very promis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
525
0
4

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 613 publications
(532 citation statements)
references
References 6 publications
3
525
0
4
Order By: Relevance
“…We include several evaluation metrics: BLEU (Papineni et al, 2002), NIST (Doddington, 2002), TER (Snover et al, 2006), METEOR (Banerjee and Lavie, 2005) and CHRF (Popovic, 2015). These scores give an estimation of the quality of the output of the experiment when comparing to a translated reference.…”
Section: Methodsmentioning
confidence: 99%
“…We include several evaluation metrics: BLEU (Papineni et al, 2002), NIST (Doddington, 2002), TER (Snover et al, 2006), METEOR (Banerjee and Lavie, 2005) and CHRF (Popovic, 2015). These scores give an estimation of the quality of the output of the experiment when comparing to a translated reference.…”
Section: Methodsmentioning
confidence: 99%
“…Because our method involves transliteration, which is applied at a character level, we found it also useful to evaluate the output with character-based metrics, which reward some translations even if the morphology is not completely correct. For this reason, we additionally report BEER (Stanojević and Sima'an 2014) and chrF3 (Popović 2015) scores.…”
Section: Neural Machine Translation Systemmentioning
confidence: 99%
“…• modified CHRF 3 (Popović, 2015) to compute character n-grams split by word boundary space with n ∈ [3, 7] whereas the F 1 (Biçici, 2011) we already use compute with word n-grams up to n = 5.…”
Section: Referential Translation Machinesmentioning
confidence: 99%