2022
DOI: 10.48550/arxiv.2206.02915
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

8-bit Numerical Formats for Deep Neural Networks

Abstract: Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the advantages of floating-point over fixed-point representation, and present an in-depth study on the use of 8-bit floating-point number formats for activations, weights, and gradients for both training and inference. We explore the effect of different bit-widths for ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(14 citation statements)
references
References 30 publications
0
14
0
Order By: Relevance
“…For range, the same techniques used to train in FP16 are required, and for precision, the use of FP8 has thus far been restricted to only the inputs of matmul (matrix multiply) operations (Sun et al, 2019;Noune et al, 2022;Micikevicius et al, 2022), with 3 mantissa bits typically required for weights and activations, and 2 mantissa bits for gradients.…”
Section: Floating-point Formats For Deep Learningmentioning
confidence: 99%
See 4 more Smart Citations
“…For range, the same techniques used to train in FP16 are required, and for precision, the use of FP8 has thus far been restricted to only the inputs of matmul (matrix multiply) operations (Sun et al, 2019;Noune et al, 2022;Micikevicius et al, 2022), with 3 mantissa bits typically required for weights and activations, and 2 mantissa bits for gradients.…”
Section: Floating-point Formats For Deep Learningmentioning
confidence: 99%
“…By 'training in FP8' we mean that matmuls are performed in FP8 (inputs are cast down to FP8, with outputs in higher precision) with wider formats typically used elsewhere, following the lead of Sun et al (2019); Noune et al (2022) and Micikevicius et al (2022). FP8 reduces both precision and range, and has not generally been used for other operations as matmuls benefit most from using low-precision formats.…”
Section: Low-precision Training Techniquesmentioning
confidence: 99%
See 3 more Smart Citations