2023
DOI: 10.1145/3571247
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations

Abstract: Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficient… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…Another recent work in a similar vein is that of Smeding and Vákár [29], who derive several variants of reverse-mode AD. They use "well-known program transformations", reasoning mostly in algorithmic terms.…”
Section: Related Workmentioning
confidence: 77%
See 1 more Smart Citation
“…Another recent work in a similar vein is that of Smeding and Vákár [29], who derive several variants of reverse-mode AD. They use "well-known program transformations", reasoning mostly in algorithmic terms.…”
Section: Related Workmentioning
confidence: 77%
“…Most works in the AD literature use the standard homogenous dual numbers. Some, like those of Krawiec et al [10] and Smeding and Vákár [29], do feature various heterogenous dual numbers, but they do not observe that these are instances of a general structure. Nagata originally defined his idealization of a module over a ring.…”
Section: Related Workmentioning
confidence: 87%
“…A linear-factoring reduction rule, which is built into the semantics of the calculus, is required for the transformation to be cost-preserving. Smeding and Vákár [SV23] improve on Brunel, Mazza and Pagani's work by showing how their approach can be efficiently implemented in a standard programming language, whose semantics does not include a linear-factoring rule. Mazza and Pagani [MP21] prove the soundness of AD transformations in the setting of PCF, a typed λ-calculus equipped with real numbers, recursion, and conditionals.…”
Section: This Is Expressed As Followsmentioning
confidence: 99%
“…Recently, the correctness of automatic differentiation has been actively studied for various types of programs. For programs that only use differentiable functions, automatic differentiation is correct everywhere, i.e., it computes the derivative of a given program at all inputs (Abadi & Plotkin, 2020;Barthe et al, 2020;Brunel et al, 2020;Elliott, 2018;Huot et al, 2020;Krawiec et al, 2022;Radul et al, 2023;Smeding & Vákár, 2023;Vákár, 2021). On the other hand, for programs that use non-differentiable functions (e.g., ReLU 1 ), automatic differentiation can be incorrect at some inputs (Bolte & Pauwels, 2020a;Griewank & Walther, 2008;Lee et al, 2020).…”
Section: Introductionmentioning
confidence: 99%