Word Confidence Estimation for SMT N-best List Re-ranking

Luong, Ngoc Quang; Besacier, Laurent; Lecouteux, Benjamin

doi:10.3115/v1/w14-0301

Cited by 7 publications

(7 citation statements)

References 11 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have shown that the quality of phrase-level confidence estimates has a direct impact of the amplitude of the improvements that can be obtained, as well as the initial quality of the rewritten hypotheses. We have used a very simple definition for confidence estimates under the form of phrase posteriors estimated from n-best lists from an initial decoder, which obtained good empirical performance, in spite of not requiring large human-annotated datasets as in other approaches (Bach et al, 2011;Luong et al, 2014b).…”

Section: Discussionmentioning

confidence: 99%

“…The difference between our approach and the reranking baseline lies in the manner in which we expand our training data, as well as in our use of high-confidence rewritings to obtain new translation hypotheses. Importantly, this work will only exploit simple confidence estimates corresponding to phrase-based posteriors, which do not require that large sets of human-annotated data be available as in other works (Bach et al, 2011;Luong et al, 2014b). The remainder of this paper is organized as follows.…”

Section: Introductionmentioning

confidence: 99%

“…Gimpel et al (2013) outperformed n-best reranking by generating, with an expensive but simple method, diverse hypotheses used as training data. Recently, Luong et al (2014b) reranked n-best lists using confidence scores at the hypothesis level computed from word-level confidence measures learnt from roughly 10,000 SMT system outputs annotated by humans. et al (2007) described a greedy search decoder, first introduced in (Germann et al, 2001), able to improve translations produced by a dynamic programming decoder using the same scoring function and translation table.…”

Section: Related Workmentioning

confidence: 99%

“…Many works have tackled the issue of word to n-gram confidence estimation in SMT output (Zens and Ney, 2006;Ueffing and Ney, 2007;Bach et al, 2011;de Gispert et al, 2013), and some attempts have been made to exploit confidence estimates for lattice rescoring (Blackwood et al, 2010) or n-best reranking (Bach et al, 2011;Luong et al, 2014b).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Confidence-based Rewriting of Machine Translation Output

Marie

Max

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Numerous works in Statistical Machine Translation (SMT) have attempted to identify better translation hypotheses obtained by an initial decoding using an improved, but more costly scoring function. In this work, we introduce an approach that takes the hypotheses produced by a state-ofthe-art, reranked phrase-based SMT system, and explores new parts of the search space by applying rewriting rules selected on the basis of posterior phraselevel confidence. In the medical domain, we obtain a 1.9 BLEU improvement over a reranked baseline exploiting the same scoring function, corresponding to a 5.4 BLEU improvement over the original Moses baseline. We show that if an indication of which phrases require rewriting is provided, our automatic rewriting procedure yields an additional improvement of 1.5 BLEU. Various analyses, including a manual error analysis, further illustrate the good performance and potential for improvement of our approach in spite of its simplicity.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Confidence-based Rewriting of Machine Translation Output

Marie

Max

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…For speech-to-text applications, CE may tell us if output translations are worth being corrected or if they require retranslation from scratch. Moreover, an accurate CE can also help to improve SLT itself through a second-pass N-best list re-ranking or search graph re-decoding, as it has already been done for text translation in [2] and [19], or for speech translation in [4]. Consequently, building a method which is capable of pointing out the correct parts as well as detecting the errors in a speech translated output is crucial to tackle above issues.…”

Section: Introductionmentioning

confidence: 99%

Automatic quality estimation for speech translation using joint ASR and MT features

Lecouteux

Besacier

2018

Machine Translation

Self Cite

View full text Add to dashboard Cite

This paper addresses automatic quality assessment of spoken language translation (SLT). This relatively new task is defined and formalized as a sequence labeling problem where each word in the SLT hypothesis is tagged as good or bad according to a large feature set. We propose several word confidence estimators (WCE) based on our automatic evaluation of transcription (ASR) quality, translation (MT) quality, or both (combined ASR+MT). This research work is possible because we built a specific corpus which contains 6.7k utterances for which a quintuplet containing: ASR output, verbatim transcript, text translation, speech translation and post-edition of translation is built. The conclusion of our multiple experiments using joint ASR and MT features for WCE is that MT features remain the most influent while ASR feature can bring interesting complementary information. Our robust quality estimators for SLT can be used for re-scoring speech translation graphs or for providing feedback to the user in interactive speech translation or computer-assisted speech-to-text scenarios.

show abstract