Machine Translation Decoding beyond Beam Search

Leblond, Rémi; Alayrac, Jean-Baptiste; Sifre, Laurent; Pîslar, Miruna; Jean-Baptiste, Lespiau; Antonoglou, Ioannis; Simonyan, Karen; Vinyals, Oriol

doi:10.18653/v1/2021.emnlp-main.662

Cited by 14 publications

(11 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Planning algorithms like MCTS have also been used to find the optimal text outputs for different natural language processing (NLP) tasks. For example, Scialom et al (2021); Leblond et al (2021); Chaffin et al (2022) use pre-trained discriminators or pre-defined metrics as reward functions. We want to emphasize that we are the first to combine a tree search algorithm with large language models for general-purpose programming language generation.…”

Section: Related Workmentioning

confidence: 99%

Planning with Large Language Models for Code Generation

Zhang¹,

Chen²,

Shen³

et al. 2023

Preprint

View full text Add to dashboard Cite

Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process. Although the programs they generate achieve high token-matching-based scores, they often fail to compile or generate incorrect outputs. The main reason is that conventional Transformer decoding algorithms may not be the best choice for code generation. In this work, we propose a novel Transformer decoding algorithm, Planning-Guided Transformer Decoding (PG-TD), that uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs. Specifically, instead of simply optimizing the likelihood of the generated sequences, the Transformer makes use of a planner to generate candidate programs and test them on public test cases. The Transformer can therefore make more informed decisions and generate tokens that will eventually lead to higher-quality programs. We also design a mechanism that shares information between the Transformer and the planner to make our algorithm computationally efficient. We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks, showing that 1) it can generate programs that consistently achieve higher performance compared with competing baseline methods; 2) it enables controllable code generation, such as concise codes and highly-commented codes by optimizing modified objective 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

Planning with Large Language Models for Code Generation

Zhang¹,

Chen²,

Shen³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…While the energy model plays a similar role to a QE system, our work differs in two ways: we use an existing, pretrained QE model instead of training a dedicated reranker, making our approach applicable to any MT system without further training; and the QE model is trained to predict human as- sessments, rather than BLEU scores. Leblond et al (2021) compare a reinforcement learning approach to reranking approaches (but not MBR decoding, as we do). They investigate the use of reference-based metrics and, for the reward function, a referencefree metric based on a modified BERTScore (Zhang et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Quality-Aware Decoding for Neural Machine Translation

Fernandes¹,

Farinhas²,

Rei³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search. In this paper, we bring together these two lines of research and propose quality-aware decoding for NMT, by leveraging recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods like N -best reranking and minimum Bayes risk decoding. We perform an extensive comparison of various possible candidate generation and ranking methods across four datasets and two model classes and find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics (COMET and BLEURT) and to human assessments. Our code is available at https://github.com/deep-spin/ qaware-decode.

show abstract

“…In neural text generation the scoring function frequently requires the computation of a neural network forward step, such as a log-probability computed using an autoregressive model, rendering these practices prohibitive. One option is to rely on a heuristic to generate samples to train a neural network that estimates v (Leblond et al, 2021). However this has been shown to be challenging as model scores are difficult to estimate.…”

Section: Adaptive Tree Search For Text Generationmentioning

confidence: 99%

Enabling arbitrary translation objectives with Adaptive Tree Search

Stokowiec¹,

Domenic²,

Sartran³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce an adaptive tree search algorithm, that can find high-scoring outputs under translation models that make no assumptions about the form or structure of the search objective. This algorithm -a deterministic variant of Monte Carlo tree search -enables the exploration of new kinds of models that are unencumbered by constraints imposed to make decoding tractable, such as autoregressivity or conditional independence assumptions. When applied to autoregressive models, our algorithm has different biases than beam search has, which enables a new analysis of the role of decoding bias in autoregressive models. Empirically, we show that our adaptive tree search algorithm finds outputs with substantially better model scores compared to beam search in autoregressive models, and compared to reranking techniques in models whose scores do not decompose additively with respect to the words in the output. We also characterise the correlation of several translation model objectives with respect to BLEU. We find that while some standard models are poorly calibrated and benefit from the beam search bias, other often more robust models (autoregressive models tuned to maximize expected automatic metric scores, the noisy channel model and a newly proposed objective) benefit from increasing amounts of search using our proposed decoder, whereas the beam search bias limits the improvements obtained from such objectives. Thus, we argue that as models improve, the improvements may be masked by over-reliance on beam search or reranking based methods.

show abstract

Machine Translation Decoding beyond Beam Search

Cited by 14 publications

References 8 publications

Planning with Large Language Models for Code Generation

Planning with Large Language Models for Code Generation

Quality-Aware Decoding for Neural Machine Translation

Enabling arbitrary translation objectives with Adaptive Tree Search

Contact Info

Product

Resources

About