Power ISA™ Version 3.1 has introduced a new family of matrix math instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions in this facility implement numerical linear algebra operations on small matrices and are meant to accelerate computation-intensive kernels, such as matrix multiplication, convolution and discrete Fourier transform. These instructions have led to a power-and area-efficient implementation of a high throughput math engine in the future POWER10 processor. Performance per core is 4 times better, at constant frequency, than the previous generation POWER9 processor. We also advocate the use of compiler builtins as the preferred way of leveraging these instructions, which we illustrate through case studies covering matrix multiplication and convolution.
There has been an overwhelming trend in recent years to move towards parallel computing. Hardware manufacturers are increasing the amount of parallelism on a single chip in several ways, including adding more processing cores and accelerators to execute the same instructions on many data items simultaneously. At the other end of the spectrum, as commodity hardware prices fall, it is becoming increasingly affordable to build large-scale, multi-node distributed machines. Similarly, as processor speeds begin to stagnate, software developers will be forced to exploit the parallelism in their applications in order to continue to improve the performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.