Word-level confidence estimation for machine translation using phrase-based translation models

Ueffing, Nicola; Ney, Hermann

doi:10.3115/1220575.1220671

Cited by 34 publications

(29 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Error analysis techniques have not been substantially explored, although it has recently been identified as an important task (Och, 2005). A few techniques for error analysis DeNeefe et al, 2005;Popovic et al, 2006) and confidence estimation (Ueffing and Ney, 2005) have begun to emerge, but in general this area remains underexplored.…”

Section: Current Directions and Future Researchmentioning

confidence: 99%

A Survey of Statistical Machine Translation

Lopez

2007

View full text Add to dashboard Cite

Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions. This is a revised draft of a paper currently under review. The contents may change in later drafts. Please send any comments, questions, or corrections to alopez@cs.umd.edu. Report Documentation PageForm Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

show abstract

Section: Current Directions and Future Researchmentioning

confidence: 99%

A Survey of Statistical Machine Translation

Lopez

2007

View full text Add to dashboard Cite

show abstract

“…One of the most effective feature combinations is the Word Posterior Probability (WPP) as suggested by Ueffing et al (2003) associated with IBM-model based features (Blatz et al, 2004). Ueffing and Ney (2005) propose an approach for phrase-based translation models: a phrase is a sequence of contiguous words and is extracted from the word-aligned bilingual training corpus. The confidence value of each word is then computed by summing over all phrase pairs in which the target part contains this word.…”

Section: Word Confidence Estimationmentioning

confidence: 99%

Word Confidence Estimation for SMT N-best List Re-ranking

Luong¹,

Besacier²,

Lecouteux³

2014

Proceedings of the EACL 2014 Workshop on Humans and Computer-Assisted Translation

View full text Add to dashboard Cite

This paper proposes to use Word Confidence Estimation (WCE) information to improve MT outputs via N-best list reranking. From the confidence label assigned for each word in the MT hypothesis, we add six scores to the baseline loglinear model in order to re-rank the N-best list. Firstly, the correlation between the WCE-based sentence-level scores and the conventional evaluation scores (BLEU, TER, TERp-A) is investigated. Then, the N-best list re-ranking is evaluated over different WCE system performance levels: from our real and efficient WCE system (ranked 1st during last WMT 2013 Quality Estimation Task) to an oracle WCE (which simulates an interactive scenario where a user simply validates words of a MT hypothesis and the new output will be automatically re-generated). The results suggest that our real WCE system slightly (but significantly) improves the baseline while the oracle one extremely boosts it; and better WCE leads to better MT quality.

show abstract

“…A novel approach introduced in [5] explicitly explores the phrase-based translation model for detecting word errors. A phrase is considered as a contiguous sequence of words and is extracted from the word-aligned bilingual training corpus.…”

Section: Previous Work Reviewmentioning

confidence: 99%

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

Luong

Besacier

Lecouteux

2014

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Abstract. This paper proposes some ideas to build an effective estimator, which predicts the quality of words in a Machine Translation (MT) output. We integrate a number of features of various types (systembased, lexical, syntactic and semantic) into the conventional feature set, for our baseline classifier training. Once having experiments with all features, we deploy a "Feature Selection" strategy to filter the best performing ones. Then, a method that combines multiple "weak" classifiers to build a strong "composite" classifier by taking advantage of their complementarity allows us achieve a better performance in term of F score. Finally, we exploit word confidence scores for improving the estimation system at sentence level.

show abstract

Word-level confidence estimation for machine translation using phrase-based translation models

Cited by 34 publications

References 15 publications

A Survey of Statistical Machine Translation

A Survey of Statistical Machine Translation

Word Confidence Estimation for SMT N-best List Re-ranking

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

Contact Info

Product

Resources

About