The platform will undergo maintenance on Sep 14 at about 9:30 AM EST and will be unavailable for approximately 1 hour.
2012
DOI: 10.5120/9581-4062
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Human Assessment of Machine Translation Quality by Combining Automatic Evaluation Metrics using Binary Classifiers

Abstract: This paper presents a method to predict human assessments of machine translation (MT) quality based on a combination of binary classifiers using a coding matrix. The multiclass categorization problem is reduced to a set of binary problems that are solved using standard classification learning algorithms trained on the results of multiple automatic evaluation metrics. Experimental results using a large-scale human-annotated evaluation corpus show that the decomposition into binary classifiers achieves higher cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 12 publications
0
7
0
2
Order By: Relevance
“…Not satisfied with the results of Run 4, we utilized the reduction of classification ambiguity approach in order to reduce a multiple classification problem to a binary classification problem, as in [4]. Instead of employing five or three different classes, we simply collapsed them into "low quality" and "high quality" translations, where low quality were those documents classified as having an adequacy or fluency score from 1-3, and high quality documents were those classified as having an adequacy or fluency score of 4-5.…”
Section: Table 2 Adequacy and Fluency Results For Runmentioning
confidence: 99%
See 1 more Smart Citation
“…Not satisfied with the results of Run 4, we utilized the reduction of classification ambiguity approach in order to reduce a multiple classification problem to a binary classification problem, as in [4]. Instead of employing five or three different classes, we simply collapsed them into "low quality" and "high quality" translations, where low quality were those documents classified as having an adequacy or fluency score from 1-3, and high quality documents were those classified as having an adequacy or fluency score of 4-5.…”
Section: Table 2 Adequacy and Fluency Results For Runmentioning
confidence: 99%
“…Their classifier focused on evaluating the wellformedness of the output sentences (similar to our concept of fluency) according to 46 features picked and extracted by the authors. Finch, and Sumita [4] used as inputs MT outputs whose translation quality had been previously assessed by human evaluators. Rather than having classifiers evaluate the adequacy and fluency of translations on a scale of 1 to 5, they employed a reduction of classification ambiguity method, turning a multi-class classification problem into a set of binary classification problems.…”
Section: Related Studiesmentioning
confidence: 99%
“…Perceptron, SVM, decision trees, and linear regression). Paul et al (2007) extended these approaches so as to account for separate aspects of quality: adequacy, fluency and acceptability. Their main contribution was in the variety of schemes they applied to decompose the multiclass classification problem.…”
Section: Combined Schemesmentioning
confidence: 99%
“…) や,n-gram の一致度に加えて,係り受 け関係の情報を用いる手法 (9) や,異なった複数の手法によ る自動評価値を用いて,単一の自動評価値よりも高い評価 性能を得るための手法 (10) (11) なども提案されている。 また,近年活発に行われている機械翻訳の評価型ワーク ショップ (12) (15) されている。 現在においては,音声合成までをも含めて,音声翻訳シ ステムの性能を TOEIC スコア化する方法は提案されてお らず,次に述べるフィールドでの評価実験 (16) や,模擬対 話実験 (17)…”
Section: 安田 圭志 隅田英一郎unclassified