Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations

Tom, Smeding,; Vákár, Matthijs

doi:10.1145/3571247

Cited by 8 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another recent work in a similar vein is that of Smeding and Vákár [29], who derive several variants of reverse-mode AD. They use "well-known program transformations", reasoning mostly in algorithmic terms.…”

Section: Related Workmentioning

confidence: 77%

See 1 more Smart Citation

Forward- or Reverse-Mode Automatic Differentiation: What's the Difference?

Berg

Schrijvers

McKinna³

et al. 2023

Preprint

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 77%

“…Most works in the AD literature use the standard homogenous dual numbers. Some, like those of Krawiec et al [10] and Smeding and Vákár [29], do feature various heterogenous dual numbers, but they do not observe that these are instances of a general structure. Nagata originally defined his idealization of a module over a ring.…”

Section: Related Workmentioning

confidence: 87%

Forward- or Reverse-Mode Automatic Differentiation: What's the Difference?

Berg

Schrijvers

McKinna³

et al. 2023

Preprint

View full text Add to dashboard Cite

“…A linear-factoring reduction rule, which is built into the semantics of the calculus, is required for the transformation to be cost-preserving. Smeding and Vákár [SV23] improve on Brunel, Mazza and Pagani's work by showing how their approach can be efficiently implemented in a standard programming language, whose semantics does not include a linear-factoring rule. Mazza and Pagani [MP21] prove the soundness of AD transformations in the setting of PCF, a typed λ-calculus equipped with real numbers, recursion, and conditionals.…”

Section: This Is Expressed As Followsmentioning

confidence: 99%

Verifying an Effect-Handler-Based Define-By-Run Reverse-Mode AD Library

de Vilhena,

Pottier

2023

Logical Methods in Computer Science

View full text Add to dashboard Cite

We apply program verification technology to the problem of specifying and verifying automatic differentiation (AD) algorithms. We focus on define-by-run, a style of AD where the program that must be differentiated is executed and monitored by the automatic differentiation algorithm. We begin by asking, "what is an implementation of AD?" and "what does it mean for an implementation of AD to be correct?" We answer these questions both at an informal level, in precise English prose, and at a formal level, using types and logical assertions. After answering these broad questions, we focus on a specific implementation of AD, which involves a number of subtle programming-language features, including dynamically allocated mutable state, first-class functions, and effect handlers. We present a machine-checked proof, expressed in a modern variant of Separation Logic, of its correctness. We view this result as an advanced exercise in program verification, with potential future applications to the verification of more realistic automatic differentiation systems and of other software components that exploit delimited-control effects.

show abstract

“…Recently, the correctness of automatic differentiation has been actively studied for various types of programs. For programs that only use differentiable functions, automatic differentiation is correct everywhere, i.e., it computes the derivative of a given program at all inputs (Abadi & Plotkin, 2020;Barthe et al, 2020;Brunel et al, 2020;Elliott, 2018;Huot et al, 2020;Krawiec et al, 2022;Radul et al, 2023;Smeding & Vákár, 2023;Vákár, 2021). On the other hand, for programs that use non-differentiable functions (e.g., ReLU 1 ), automatic differentiation can be incorrect at some inputs (Bolte & Pauwels, 2020a;Griewank & Walther, 2008;Lee et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

On the Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters

Lee¹,

Park²,

Aiken³

2023

Preprint

View full text Add to dashboard Cite

Recent work has shown that automatic differentiation over the reals is almost always correct in a mathematically precise sense. However, actual programs work with machine-representable numbers (e.g., floating-point numbers), not reals. In this paper, we study the correctness of automatic differentiation when the parameter space of a neural network consists solely of machinerepresentable numbers. For a neural network with bias parameters, we prove that automatic differentiation is correct at all parameters where the network is differentiable. In contrast, it is incorrect at all parameters where the network is non-differentiable, since it never informs nondifferentiability. To better understand this nondifferentiable set of parameters, we prove a tight bound on its size, which is linear in the number of non-differentiabilities in activation functions, and provide a simple necessary and sufficient condition for a parameter to be in this set. We further prove that automatic differentiation always computes a Clarke subderivative, even on the nondifferentiable set. We also extend these results to neural networks possibly without bias parameters.1 ReLU(x) max{x, 0}.

show abstract

Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations

Cited by 8 publications

References 20 publications

Forward- or Reverse-Mode Automatic Differentiation: What's the Difference?

Forward- or Reverse-Mode Automatic Differentiation: What's the Difference?

Verifying an Effect-Handler-Based Define-By-Run Reverse-Mode AD Library

On the Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters

Contact Info

Product

Resources

About