2019
DOI: 10.48550/arxiv.1905.12322
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Study of BFLOAT16 for Deep Learning Training

Abstract: This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 45 publications
(59 citation statements)
references
References 20 publications
0
53
0
Order By: Relevance
“…We adopt BF16 as the data format. BF16 has the same accuracy as FP32 for NN training [68] but is more cost-efficient. We estimate CAE and NME's area and power using 16nm and 28nm technologies, respectively.…”
Section: Evaluation a Methodologymentioning
confidence: 99%
“…We adopt BF16 as the data format. BF16 has the same accuracy as FP32 for NN training [68] but is more cost-efficient. We estimate CAE and NME's area and power using 16nm and 28nm technologies, respectively.…”
Section: Evaluation a Methodologymentioning
confidence: 99%
“…The reconfigurable core consists of three MAC modules and four Multiplexers. Each MAC contains a BFloat16 multiplier and an FP32 adder [19], [20] to accommodate both training and inference. If only inference is desired, the hardware can be 8-bit int8 type [2], [3].…”
Section: A Reconfigurable Corementioning
confidence: 99%
“…Micikevicius et al [91] proposed a general-purpose mixed precision training framework for training large-scale DNNs efficiently, almost halving the GPU memory usage. A mixed precision training framework adopting BFLOAT16 format, able to represent the same range of values as FP32 is, was presented by Kalamkar et al [92] to avoid the entailment of loss scaling in [91]. Recently, Yang et al [93] proposed a low-precision stochastic gradient descent (SGD) approach by taking advantage of stochastic weight averaging and quantizing the gradient accumulator as well as the velocity vector.…”
Section: Reduced-precision Training For Neural Networkmentioning
confidence: 99%