A Summary and Comparative Study of Different Metrics for Machine Translation Evaluation

Malik, Pooja; Baghel, Anurag Singh

doi:10.1109/confluence.2018.8442777

Cited by 8 publications

(6 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…BLEU is a very well-known MT quality evaluation and it estimates precision. METEOR is also well known but more complicated measure which estimates both precision and recall using F mean score [ 47 , 48 ]. In the following subsections, we discussed the quality evaluation of our proposed system versus Omega-T and Apertium.…”

Section: Resultsmentioning

confidence: 99%

“…Translation quality was also evaluated by an automatic process. Both BLEU and F mean scores [ 47 ] were utilized. The BLEU score measures the precision of unigrams, up to four-grams, with respect to reference translations.…”

Section: Resultsmentioning

confidence: 99%

“…Unlike BLEU, which only estimates precision, METEOR estimates precision and recall, and combines both using F mean score [ 47 , 48 ]. Table 5 and Table 6 present automated evaluations using the F mean score for the English-Arabic corpus and the English-French corpus respectively.…”

Section: Resultsmentioning

confidence: 99%

See 2 more Smart Citations

Machine Translation Utilizing the Frequent-Item Set Concept

Mahmoud

Mengash

2021

Sensors

View full text Add to dashboard Cite

In this paper, we introduce new concepts in the machine translation paradigm. We treat the corpus as a database of frequent word sets. A translation request triggers association rules joining phrases present in the source language, and phrases present in the target language. It has to be noted that a sequential scan of the corpus for such phrases will increase the response time in an unexpected manner. We introduce the pre-processing of the bilingual corpus through proposing a data structure called Corpus-Trie (CT) that renders a bilingual parallel corpus in a compact data structure representing frequent data items sets. We also present algorithms which utilize the CT to respond to translation requests and explore novel techniques in exhaustive experiments. Experiments were performed on specific language pairs, although the proposed method is not restricted to any specific language. Moreover, the proposed Corpus-Trie can be extended from bilingual corpora to accommodate multi-language corpora. Experiments indicated that the response time of a translation request is logarithmic to the count of unrepeated phrases in the original bilingual corpus (and thus, the Corpus-Trie size). In practical situations, 5–20% of the log of the number of the nodes have to be visited. The experimental results indicate that the BLEU score for the proposed CT system increases with the size of the number of phrases in the CT, for both English-Arabic and English-French translations. The proposed CT system was demonstrated to be better than both Omega-T and Apertium in quality of translation from a corpus size exceeding 1,600,000 phrases for English-Arabic translation, and 300,000 phrases for English-French translation.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Machine Translation Utilizing the Frequent-Item Set Concept

Mahmoud

Mengash

2021

Sensors

View full text Add to dashboard Cite

show abstract

“…So after analyzing the scores of different metrics [32], we concluded that in many of the sentences, the score is lower down because of the use of either synonym of the same word or due to the different form of writing the same word. So, in our next module, we replaced the words with their synonyms so as to get matched with their references.…”

Section: A Fetching Input From the Datasetmentioning

confidence: 99%

“…Furthermore, we have also implemented five different metrics (BLEU, METEOR, GTM, AMBER and BLLIP). The same has been discussed in one of our paper [32] where the scores are calculated for 200 sentences (100 from agriculture and 100 from judiciary) so as to see how the different metrics score different sentences translated using the same MT systems. Table I shows the scores of the same sentences for different metrics.…”

Section: A Fetching Input From the Datasetmentioning

confidence: 99%

A Better Gauging Model for the Evaluation of Automatic Machine Translation of English – Hindi Language

Malik

2019

IJITEE

Self Cite

View full text Add to dashboard Cite

The problem of language translation has prevailed in society for so long. However, up to some extent the problem is being reduced by the online available machine translation systems like Google, Bing, Babelfish, etc. But with the emergence of these Machine Translation Systems, there arises the problem of their validation. Can we trust on such translation systems blindly? Is there no scope of improvement? Are these Machine Translation systems not prone to errors? The answer to all these questions is No. So, for this purpose, we need a mechanism that can test or assess these Machine Translation systems. In this paper, we have proposed an algorithm that will evaluate such Machine Translation systems. Our algorithm is being compared with a very well-known BLEU algorithm that works very well for non-Indian languages. The accuracy of the designed algorithm is evaluated using the standard datasets like Tides and EMILLE.

show abstract