Abstract:This paper presents a method to predict human assessments of machine translation (MT) quality based on a combination of binary classifiers using a coding matrix. The multiclass categorization problem is reduced to a set of binary problems that are solved using standard classification learning algorithms trained on the results of multiple automatic evaluation metrics. Experimental results using a large-scale human-annotated evaluation corpus show that the decomposition into binary classifiers achieves higher cl… Show more
“…Not satisfied with the results of Run 4, we utilized the reduction of classification ambiguity approach in order to reduce a multiple classification problem to a binary classification problem, as in [4]. Instead of employing five or three different classes, we simply collapsed them into "low quality" and "high quality" translations, where low quality were those documents classified as having an adequacy or fluency score from 1-3, and high quality documents were those classified as having an adequacy or fluency score of 4-5.…”
Section: Table 2 Adequacy and Fluency Results For Runmentioning
confidence: 99%
“…Their classifier focused on evaluating the wellformedness of the output sentences (similar to our concept of fluency) according to 46 features picked and extracted by the authors. Finch, and Sumita [4] used as inputs MT outputs whose translation quality had been previously assessed by human evaluators. Rather than having classifiers evaluate the adequacy and fluency of translations on a scale of 1 to 5, they employed a reduction of classification ambiguity method, turning a multi-class classification problem into a set of binary classification problems.…”
We explored supervised machine learning (ML) techniques to understand and predict the adequacy and fluency of EnglishSpanish machine translation. Five experiments were conducted using three classifiers in Weka, an open-source ML tool. We found that the highest performance was achieved by applying a dimensionality reduction approach to the classification task, which included collapsing a numeric scale of quality to two categories: high quality and low quality. Our results showed that the Support Vector Machine classifier performed the best at predicting the adequacy (65.65%) and fluency (65.77%) of the translations. More research is needed to explore the methodologies of applying ML to translation evaluation.
“…Not satisfied with the results of Run 4, we utilized the reduction of classification ambiguity approach in order to reduce a multiple classification problem to a binary classification problem, as in [4]. Instead of employing five or three different classes, we simply collapsed them into "low quality" and "high quality" translations, where low quality were those documents classified as having an adequacy or fluency score from 1-3, and high quality documents were those classified as having an adequacy or fluency score of 4-5.…”
Section: Table 2 Adequacy and Fluency Results For Runmentioning
confidence: 99%
“…Their classifier focused on evaluating the wellformedness of the output sentences (similar to our concept of fluency) according to 46 features picked and extracted by the authors. Finch, and Sumita [4] used as inputs MT outputs whose translation quality had been previously assessed by human evaluators. Rather than having classifiers evaluate the adequacy and fluency of translations on a scale of 1 to 5, they employed a reduction of classification ambiguity method, turning a multi-class classification problem into a set of binary classification problems.…”
We explored supervised machine learning (ML) techniques to understand and predict the adequacy and fluency of EnglishSpanish machine translation. Five experiments were conducted using three classifiers in Weka, an open-source ML tool. We found that the highest performance was achieved by applying a dimensionality reduction approach to the classification task, which included collapsing a numeric scale of quality to two categories: high quality and low quality. Our results showed that the Support Vector Machine classifier performed the best at predicting the adequacy (65.65%) and fluency (65.77%) of the translations. More research is needed to explore the methodologies of applying ML to translation evaluation.
“…Perceptron, SVM, decision trees, and linear regression). Paul et al (2007) extended these approaches so as to account for separate aspects of quality: adequacy, fluency and acceptability. Their main contribution was in the variety of schemes they applied to decompose the multiclass classification problem.…”
Assessing the quality of candidate translations involves diverse linguistic\ud
facets. However, most automatic evaluation methods in use today rely on limited\ud
quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences.\ud
In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems.\ud
Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.Peer ReviewedPostprint (published version
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.