2022
DOI: 10.48550/arxiv.2204.05307
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Toward More Effective Human Evaluation for Machine Translation

Abstract: Improvements in text generation technologies such as machine translation have necessitated more costly and time-consuming human evaluation procedures to ensure an accurate signal. We investigate a simple way to reduce cost by reducing the number of text segments that must be annotated in order to accurately predict a score for a complete test set. Using a sampling approach, we demonstrate that information from document membership and automatic metrics can help improve estimates compared to a pure random sampli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 12 publications
(16 reference statements)
0
1
0
Order By: Relevance
“…Being able to filter out exactly semantically equivalent sentence pairs would reduce this workload. Similarly, filtering out exactly semantically equivalent sentences can lessen the amount of annotation necessary for human evaluations of text (Saldías et al, 2022).…”
Section: Discussionmentioning
confidence: 99%
“…Being able to filter out exactly semantically equivalent sentence pairs would reduce this workload. Similarly, filtering out exactly semantically equivalent sentences can lessen the amount of annotation necessary for human evaluations of text (Saldías et al, 2022).…”
Section: Discussionmentioning
confidence: 99%
“…While claims are made by recent literature (Goyal et al, 2022) that a human's involvement is timely and expensive, it cannot be absent. In order to determine acceptable values for human involvement, we rely on the past investigation in the area (Koehn, 2009;González Rubio, 2014;Way, 2018;Kreutzer et al, 2022;Saldías et al, 2022) to answer the main questions below.…”
Section: Plan B: Workaroundsmentioning
confidence: 99%
“…Other work Doherty, 2018) mentions that translation quality assessments around 60 to 70% are acceptable. For a LRMTS, the human involvement can lead to high quality LRMTS as shown by (Saldías et al, 2022).…”
Section: Plan B: Workaroundsmentioning
confidence: 99%