Predicting Human Assessment of Machine Translation Quality by Combining Automatic Evaluation Metrics using Binary Classifiers

Paul, Michael; Finch, Andrew; Sumita, Eiichiro

doi:10.5120/9581-4062

Cited by 6 publications

(9 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Not satisfied with the results of Run 4, we utilized the reduction of classification ambiguity approach in order to reduce a multiple classification problem to a binary classification problem, as in [4]. Instead of employing five or three different classes, we simply collapsed them into "low quality" and "high quality" translations, where low quality were those documents classified as having an adequacy or fluency score from 1-3, and high quality documents were those classified as having an adequacy or fluency score of 4-5.…”

Section: Table 2 Adequacy and Fluency Results For Runmentioning

confidence: 99%

“…Their classifier focused on evaluating the wellformedness of the output sentences (similar to our concept of fluency) according to 46 features picked and extracted by the authors. Finch, and Sumita [4] used as inputs MT outputs whose translation quality had been previously assessed by human evaluators. Rather than having classifiers evaluate the adequacy and fluency of translations on a scale of 1 to 5, they employed a reduction of classification ambiguity method, turning a multi-class classification problem into a set of binary classification problems.…”

Section: Related Studiesmentioning

confidence: 99%

See 1 more Smart Citation

A Machine Learning Approach to Evaluating Translation Quality

Ayala¹,

Chen²

2017

2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

View full text Add to dashboard Cite

We explored supervised machine learning (ML) techniques to understand and predict the adequacy and fluency of EnglishSpanish machine translation. Five experiments were conducted using three classifiers in Weka, an open-source ML tool. We found that the highest performance was achieved by applying a dimensionality reduction approach to the classification task, which included collapsing a numeric scale of quality to two categories: high quality and low quality. Our results showed that the Support Vector Machine classifier performed the best at predicting the adequacy (65.65%) and fluency (65.77%) of the translations. More research is needed to explore the methodologies of applying ML to translation evaluation.

show abstract

Section: Table 2 Adequacy and Fluency Results For Runmentioning

confidence: 99%

Section: Related Studiesmentioning

confidence: 99%

A Machine Learning Approach to Evaluating Translation Quality

Ayala¹,

Chen²

2017

2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

View full text Add to dashboard Cite

show abstract

“…Perceptron, SVM, decision trees, and linear regression). Paul et al (2007) extended these approaches so as to account for separate aspects of quality: adequacy, fluency and acceptability. Their main contribution was in the variety of schemes they applied to decompose the multiclass classification problem.…”

Section: Combined Schemesmentioning

confidence: 99%

Linguistic measures for automatic machine translation evaluation

Giménez

Màrquez

2010

Machine Translation

View full text Add to dashboard Cite

Assessing the quality of candidate translations involves diverse linguistic\ud facets. However, most automatic evaluation methods in use today rely on limited\ud quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences.\ud In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems.\ud Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.Peer ReviewedPostprint (published version

show abstract

“…) や，n-gram の一致度に加えて，係り受け関係の情報を用いる手法 (9) や，異なった複数の手法による自動評価値を用いて，単一の自動評価値よりも高い評価性能を得るための手法 (10) (11) なども提案されている。また，近年活発に行われている機械翻訳の評価型ワークショップ (12) (15) されている。現在においては，音声合成までをも含めて，音声翻訳システムの性能を TOEIC スコア化する方法は提案されておらず，次に述べるフィールドでの評価実験 (16) や，模擬対話実験 (17)…”

Section: 安田圭志隅田英一郎unclassified

5. Evaluation of Speech Translation Systems

Yasuda¹,

Sumita²

2010

IEEJ Journal

View full text Add to dashboard Cite

Predicting Human Assessment of Machine Translation Quality by Combining Automatic Evaluation Metrics using Binary Classifiers

Cited by 6 publications

References 12 publications

A Machine Learning Approach to Evaluating Translation Quality

A Machine Learning Approach to Evaluating Translation Quality

Linguistic measures for automatic machine translation evaluation

5. Evaluation of Speech Translation Systems

Contact Info

Product

Resources

About