Language Model Cascades

Dohan, D.; Xu, Winnie; Lewkowycz, Aitor; Austin, Jacob; Bieber, David; Lopes, Raphael Gontijo; Wu, Yuhuai; Michalewski, Henryk; Saurous, Rif A.; Sohl-Dickstein, Jascha; Murphy, Kevin M.; Sutton, Charles

doi:10.48550/arxiv.2207.10342

Cited by 6 publications

(8 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The rationale is not a separate output that may or may not be consistent with the answer, it is the only part of the passage available to the answer-generation step. This is an example of general architecture that has been referred to as a language model cascade (Dohan et al, 2022), a framework that generalizes earlier work on prompt chaining and multi-stage prompting (Liu et al, 2022). Our work shows that such cascades can indeed lead to reliable and useful rationales.…”

Section: Related Worksupporting

confidence: 57%

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Eisenstein¹,

Andor²,

Bohnet³

et al. 2022

Preprint

View full text Add to dashboard Cite

Explainable question answering systems should produce not only accurate answers but also rationales that justify their reasoning and allow humans to check their work. But what sorts of rationales are useful and how can we train systems to produce them? We propose a new style of rationale for open-book question answering, called markup-and-mask, which combines aspects of extractive and free-text explanations. In the markup phase, the passage is augmented with free-text markup that enables each sentence to stand on its own outside the discourse context. In the masking phase, a sub-span of the marked-up passage is selected. To train a system to produce markup-and-mask rationales without annotations, we leverage in-context learning. Specifically, we generate silver annotated data by sending a series of prompts to a frozen pretrained language model, which acts as a teacher. We then fine-tune a smaller student model by training on the subset of rationales that led to correct answers. The student is "honest" in the sense that it is a pipeline: the rationale acts as a bottleneck between the passage and the answer, while the "untrusted" teacher operates under no such constraints. Thus, we offer a new way to build trustworthy pipeline systems from a combination of end-task annotations and frozen pretrained language models.

show abstract

Section: Related Worksupporting

confidence: 57%

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Eisenstein¹,

Andor²,

Bohnet³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To our knowledge, the idea of integrating language models as primitives into a probabilistic programming system was first proposed by Lew et al [2020], who showed that in certain verbal reasoning tasks, the posteriors of such programs were better models of human behavior than unconstrained language models. More recently, Dohan et al [2022] proposed unifying various approaches to "chaining" LLMs by understanding them as graphical models or probabilistic programs with string-valued random variables. But in the "chainof-thought"-style applications they explore, there are typically no unknown variables with non-trivial likelihood terms, so no inference algorithm is required-"forward" or "ancestral" sampling suffices.…”

Section: Related Work and Discussionmentioning

confidence: 99%

“…to enable conversational interfaces without re-evaluating the entire conversation history with each new message. But we have found that extending it to the multi-particle setting makes inference in language model probabilistic programs significantly cheaper, compared to previous approaches to integrating Transformer models into probabilistic programs [Lew et al, 2020, Dohan et al, 2022, Zhi-Xuan, 2022].…”

mentioning

confidence: 82%

“…As a step toward this goal, we propose sequential Monte Carlo (SMC) steering, an alternative to standard decoding procedures that works by approximating the posteriors of language model probabilistic programs [Lew et al, 2020, Dohan et al, 2022, Zhi-Xuan, 2022: models that mix LLMs, probabilistic conditioning, and symbolic programming to encode semantic and syntactic constraints. By varying the probabilistic program, SMC can steer LLMs to solve diverse tasks, including infilling [Qian and Levy, 2022, Donahue et al, 2020, Bavarian et al, 2022, constrained generation [Zhang et al, 2023a, Pascual et al, 2020, Roush et al, 2022, and prompt intersection (Figure 1), all at a cost similar to that of beam search.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs

Lew

Huot

Staton

et al. 2023

Proc. ACM Program. Lang.

View full text Add to dashboard Cite

Optimizing the expected values of probabilistic processes is a central problem in computer science and its applications, arising in fields ranging from artificial intelligence to operations research to statistical computing. Unfortunately, automatic differentiation techniques developed for deterministic programs do not in general compute the correct gradients needed for widely used solutions based on gradient-based optimization. In this paper, we present ADEV, an extension to forward-mode AD that correctly differentiates the expectations of probabilistic processes represented as programs that make random choices. Our algorithm is a source-to-source program transformation on an expressive, higher-order language for probabilistic computation, with both discrete and continuous probability distributions. The result of our transformation is a new probabilistic program, whose expected return value is the derivative of the original program’s expectation. This output program can be run to generate unbiased Monte Carlo estimates of the desired gradient, that can be used within the inner loop of stochastic gradient descent. We prove ADEV correct using logical relations over the denotations of the source and target probabilistic programs. Because it modularly extends forward-mode AD, our algorithm lends itself to a concise implementation strategy, which we exploit to develop a prototype in just a few dozen lines of Haskell (https://github.com/probcomp/adev).

show abstract

“…More recent approaches have proposed probabilistic inference approaches for tackling true/false question answering and commonsense question answering (Jung et al, 2022;Liu et al, 2022a). Xie et al (2021) presents a Bayesian inference perspective on incontext learning, and Dohan et al (2022) formalizes and unifies existing prompting techniques in a probabilistic framework. Our work generalizes such approaches to perform arbitrary probabilistic inference outside of the LLM.…”

Section: Related Workmentioning

confidence: 99%

ThinkSum: Probabilistic reasoning over sets using large language models

Ozturkler¹,

Malkin²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reproducing patterns in linear text that occur in their training data (zero-shot evaluation) or in the provided context (few-shot in-context learning). However, recent studies show that even the largest LLMs fail in scenarios that require reasoning over multiple objects or facts or making sequences of logical deductions. We propose a two-stage probabilistic inference paradigm, THINKSUM, that reasons over sets of objects or facts in a structured manner. In the first stage (THINK -'fast' retrieval of associations), a LLM is queried in parallel over a set of phrases extracted from the prompt or an auxiliary model call. In the second stage (SUM -'slow' probabilistic inference or reasoning), the results of these queries are aggregated to make the final prediction. We demonstrate the advantages of THINKSUM on the BIG-bench suite of evaluation tasks, achieving improvements over the state of the art using GPT-family models on ten difficult tasks, often with far smaller model variants. We compare and contrast THINKSUM with other proposed modifications to direct prompting of LLMs, such as variants of chain-of-thought prompting. We argue that because the probabilistic inference in THINKSUM is performed outside of calls to the LLM, THINKSUM is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs.

show abstract

Language Model Cascades

Cited by 6 publications

References 6 publications

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs

ThinkSum: Probabilistic reasoning over sets using large language models

Contact Info

Product

Resources

About