Abstract:Stochastic rounding (SR) randomly maps a real number
x
to one of the two nearest values in a finite precision number system. The probability of choosing either of these two numbers is 1 minus their relative distance to
x
. This rounding mode was first proposed for use in computer arithmetic in the 1950s and it is currently experiencing a resurgence of interest. If used to compute the inner product of two vectors of length
n
in floating-poi… Show more
“…Stochastic rounding has drawn a lot of attention in various domains [14], [19], [22], [23] due to its efficiency compared to the default rounding mode. The fact that SR-nearness satisfies mean independence (a weaker property than independence) leads to an expected value that coincides with the exact value.…”
Section: Discussionmentioning
confidence: 99%
“…Stochastic arithmetic has two main applications [14]. First, it can be used to estimate empirically the numerical error of complex programs.…”
Section: Introductionmentioning
confidence: 99%
“…The positive effect of SR extends also to the calculation of the solution of ordinary differential equations (ODEs) in low precision [22], [23] where SR reduces the accumulation of rounding errors by avoiding stagnation phenomenon when the step decreases. Various other applications such as PDEs, Quantum mechanics, Quantum computing use SR to improve their results [14].…”
Recently, stochastic rounding (SR) has been implemented in specialized hardware but most current computing nodes do not yet support this rounding mode. Several works empirically illustrate the benefit of stochastic rounding in various fields such as neural networks and ordinary differential equations. For some algorithms, such as summation, inner product or matrixvector multiplication, it has been proved that SR provides probabilistic error bounds better than the traditional deterministic bounds.In this paper, we extend this theoretical ground for a wider adoption of SR in computer architecture. First, we analyze the biases of the two SR modes: SR-nearness and SR-up-or-down. We demonstrate on a case-study of Euler's forward method that IEEE-754 default rounding modes and SR-up-or-down accumulate rounding errors across iterations and that SR-nearness, being unbiased, does not. Second, we prove a O( √ n) probabilistic bound on the forward error of Horner's polynomial evaluation method with SR, improving on the known deterministic O(n) bound.
“…Stochastic rounding has drawn a lot of attention in various domains [14], [19], [22], [23] due to its efficiency compared to the default rounding mode. The fact that SR-nearness satisfies mean independence (a weaker property than independence) leads to an expected value that coincides with the exact value.…”
Section: Discussionmentioning
confidence: 99%
“…Stochastic arithmetic has two main applications [14]. First, it can be used to estimate empirically the numerical error of complex programs.…”
Section: Introductionmentioning
confidence: 99%
“…The positive effect of SR extends also to the calculation of the solution of ordinary differential equations (ODEs) in low precision [22], [23] where SR reduces the accumulation of rounding errors by avoiding stagnation phenomenon when the step decreases. Various other applications such as PDEs, Quantum mechanics, Quantum computing use SR to improve their results [14].…”
Recently, stochastic rounding (SR) has been implemented in specialized hardware but most current computing nodes do not yet support this rounding mode. Several works empirically illustrate the benefit of stochastic rounding in various fields such as neural networks and ordinary differential equations. For some algorithms, such as summation, inner product or matrixvector multiplication, it has been proved that SR provides probabilistic error bounds better than the traditional deterministic bounds.In this paper, we extend this theoretical ground for a wider adoption of SR in computer architecture. First, we analyze the biases of the two SR modes: SR-nearness and SR-up-or-down. We demonstrate on a case-study of Euler's forward method that IEEE-754 default rounding modes and SR-up-or-down accumulate rounding errors across iterations and that SR-nearness, being unbiased, does not. Second, we prove a O( √ n) probabilistic bound on the forward error of Horner's polynomial evaluation method with SR, improving on the known deterministic O(n) bound.
“…round(•) denotes the rounding function. Here we adopt the stochastic rounding [15] as it theoretically guarantees smaller probabilistic error bounds [16] compared to the nearest rounding. Specifically, it can be formulated as…”
There has been an explosion of interest in designing high-performance Transformers. While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences. To this end, we present Mesa, a memorysaving resource-efficient training framework for Transformers. Specifically, Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. The low-precision activations are then dequantized during backpropagation to compute gradients. Besides, to address the heterogeneous activation distributions in the multi-head self-attention layers, we propose a head-wise activation quantization strategy, which quantizes activations based on the statistics of each head to minimize the approximation error. To further boost training efficiency, we learn quantization parameters by running estimates. More importantly, by re-investing the saved memory in employing a larger batch size or scaling up model size, we may further improve the performance under constrained computational resources. Extensive experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training while achieving comparable or even better performance. Code is available at https://github.com/zhuang-group/Mesa.
“…Second, SR can be used as a replacement for the default deterministic rounding mode in numerical simulations. It has been demonstrated that in multiple domains such as neural networks, ODEs, PDEs, and Quantum mechanics [8], SR provides better results compared to the IEEE-754 default rounding mode [3]. Connolly et al [23] show that SR successfully prevents the phenomenon of stagnation that takes place in various applications such as neural networks, ODEs and PDEs.…”
Stochastic rounding (SR) offers an alternative to the deterministic IEEE-754 floating-point rounding modes. In some applications such as PDEs, ODEs and neural networks, SR empirically improves the numerical behavior and convergence to accurate solutions while no sound theoretical background has been provided. Recent works by Ipsen, Zhou, Higham, and Mary have computed SR probabilistic error bounds for basic linear algebra kernels. For example, the inner product SR probabilistic bound of the forward error is proportional to √ nu instead of nu for the default rounding mode. To compute the bounds, these works show that the errors accumulated in computation form a martingale.This paper proposes an alternative framework to characterize SR errors based on the computation of the variance. We pinpoint common error patterns in numerical algorithms and propose a lemma that bounds their variance. For each probability and through Bienaymé-Chebyshev inequality, this bound leads to better probabilistic error bound in several situations. Our method has the advantage of providing a tight probabilistic bound for all algorithms fitting our model. We show how the method can be applied to give SR error bounds for the inner product and Horner polynomial evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.