Stochastic rounding: implementation, error analysis and applications

Croci, Matteo; Fasi, Massimiliano; Higham, Nicholas J.; Mary, Théo; Mikaitis, Mantas

doi:10.1098/rsos.211631

Cited by 30 publications

(24 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Stochastic rounding has drawn a lot of attention in various domains [14], [19], [22], [23] due to its efficiency compared to the default rounding mode. The fact that SR-nearness satisfies mean independence (a weaker property than independence) leads to an expected value that coincides with the exact value.…”

Section: Discussionmentioning

confidence: 99%

“…Stochastic arithmetic has two main applications [14]. First, it can be used to estimate empirically the numerical error of complex programs.…”

Section: Introductionmentioning

confidence: 99%

“…The positive effect of SR extends also to the calculation of the solution of ordinary differential equations (ODEs) in low precision [22], [23] where SR reduces the accumulation of rounding errors by avoiding stagnation phenomenon when the step decreases. Various other applications such as PDEs, Quantum mechanics, Quantum computing use SR to improve their results [14].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The Positive Effects of Stochastic Rounding in Numerical Algorithms

Arar

Sohier

Castro

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently, stochastic rounding (SR) has been implemented in specialized hardware but most current computing nodes do not yet support this rounding mode. Several works empirically illustrate the benefit of stochastic rounding in various fields such as neural networks and ordinary differential equations. For some algorithms, such as summation, inner product or matrixvector multiplication, it has been proved that SR provides probabilistic error bounds better than the traditional deterministic bounds.In this paper, we extend this theoretical ground for a wider adoption of SR in computer architecture. First, we analyze the biases of the two SR modes: SR-nearness and SR-up-or-down. We demonstrate on a case-study of Euler's forward method that IEEE-754 default rounding modes and SR-up-or-down accumulate rounding errors across iterations and that SR-nearness, being unbiased, does not. Second, we prove a O( √ n) probabilistic bound on the forward error of Horner's polynomial evaluation method with SR, improving on the known deterministic O(n) bound.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Stochastic arithmetic has two main applications [14]. First, it can be used to estimate empirically the numerical error of complex programs.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Positive Effects of Stochastic Rounding in Numerical Algorithms

Arar

Sohier

Castro

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…round(•) denotes the rounding function. Here we adopt the stochastic rounding [15] as it theoretically guarantees smaller probabilistic error bounds [16] compared to the nearest rounding. Specifically, it can be formulated as…”

Section: Head-wise Activation Quantizationmentioning

confidence: 99%

Mesa: A Memory-saving Training Framework for Transformers

Pan¹,

Chen²,

He³

et al. 2021

Preprint

View full text Add to dashboard Cite

There has been an explosion of interest in designing high-performance Transformers. While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences. To this end, we present Mesa, a memorysaving resource-efficient training framework for Transformers. Specifically, Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. The low-precision activations are then dequantized during backpropagation to compute gradients. Besides, to address the heterogeneous activation distributions in the multi-head self-attention layers, we propose a head-wise activation quantization strategy, which quantizes activations based on the statistics of each head to minimize the approximation error. To further boost training efficiency, we learn quantization parameters by running estimates. More importantly, by re-investing the saved memory in employing a larger batch size or scaling up model size, we may further improve the performance under constrained computational resources. Extensive experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training while achieving comparable or even better performance. Code is available at https://github.com/zhuang-group/Mesa.

show abstract

“…Second, SR can be used as a replacement for the default deterministic rounding mode in numerical simulations. It has been demonstrated that in multiple domains such as neural networks, ODEs, PDEs, and Quantum mechanics [8], SR provides better results compared to the IEEE-754 default rounding mode [3]. Connolly et al [23] show that SR successfully prevents the phenomenon of stagnation that takes place in various applications such as neural networks, ODEs and PDEs.…”

Section: Introductionmentioning

confidence: 99%

Stochastic rounding variance and probabilistic bounds: A new approach *

Arar¹,

Sohier²,

Castro³

et al. 2022

Preprint

View full text Add to dashboard Cite

Stochastic rounding (SR) offers an alternative to the deterministic IEEE-754 floating-point rounding modes. In some applications such as PDEs, ODEs and neural networks, SR empirically improves the numerical behavior and convergence to accurate solutions while no sound theoretical background has been provided. Recent works by Ipsen, Zhou, Higham, and Mary have computed SR probabilistic error bounds for basic linear algebra kernels. For example, the inner product SR probabilistic bound of the forward error is proportional to √ nu instead of nu for the default rounding mode. To compute the bounds, these works show that the errors accumulated in computation form a martingale.This paper proposes an alternative framework to characterize SR errors based on the computation of the variance. We pinpoint common error patterns in numerical algorithms and propose a lemma that bounds their variance. For each probability and through Bienaymé-Chebyshev inequality, this bound leads to better probabilistic error bound in several situations. Our method has the advantage of providing a tight probabilistic bound for all algorithms fitting our model. We show how the method can be applied to give SR error bounds for the inner product and Horner polynomial evaluation.

show abstract

Stochastic rounding: implementation, error analysis and applications

Cited by 30 publications

References 73 publications

The Positive Effects of Stochastic Rounding in Numerical Algorithms

The Positive Effects of Stochastic Rounding in Numerical Algorithms

Mesa: A Memory-saving Training Framework for Transformers

Stochastic rounding variance and probabilistic bounds: A new approach *

Contact Info

Product

Resources

About