2018 8th International Conference on Cloud Computing, Data Science &Amp; Engineering (Confluence) 2018
DOI: 10.1109/confluence.2018.8442777
|View full text |Cite
|
Sign up to set email alerts
|

A Summary and Comparative Study of Different Metrics for Machine Translation Evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 1 publication
0
6
0
Order By: Relevance
“…BLEU is a very well-known MT quality evaluation and it estimates precision. METEOR is also well known but more complicated measure which estimates both precision and recall using F mean score [ 47 , 48 ]. In the following subsections, we discussed the quality evaluation of our proposed system versus Omega-T and Apertium.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…BLEU is a very well-known MT quality evaluation and it estimates precision. METEOR is also well known but more complicated measure which estimates both precision and recall using F mean score [ 47 , 48 ]. In the following subsections, we discussed the quality evaluation of our proposed system versus Omega-T and Apertium.…”
Section: Resultsmentioning
confidence: 99%
“…Translation quality was also evaluated by an automatic process. Both BLEU and F mean scores [ 47 ] were utilized. The BLEU score measures the precision of unigrams, up to four-grams, with respect to reference translations.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…So after analyzing the scores of different metrics [32], we concluded that in many of the sentences, the score is lower down because of the use of either synonym of the same word or due to the different form of writing the same word. So, in our next module, we replaced the words with their synonyms so as to get matched with their references.…”
Section: A Fetching Input From the Datasetmentioning
confidence: 99%
“…Furthermore, we have also implemented five different metrics (BLEU, METEOR, GTM, AMBER and BLLIP). The same has been discussed in one of our paper [32] where the scores are calculated for 200 sentences (100 from agriculture and 100 from judiciary) so as to see how the different metrics score different sentences translated using the same MT systems. Table I shows the scores of the same sentences for different metrics.…”
Section: A Fetching Input From the Datasetmentioning
confidence: 99%