SMaLL: Software for Rapidly Instantiating Machine Learning Libraries

Sridhar, Upasana; Tukanov, Nicholai; Binder, Elliott; Low, Tze Meng; McMillan, Scott; Schätz, Martin

doi:10.1145/3607870

Cited by 1 publication

(1 citation statement)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specialized data layouts. Many prior works have sought to optimize convolution operations by introducing specialized data formats that allow for continuous memory accesses and direct use of SIMD instructions and FMA units [24,31,61,67,69]. These approaches have demonstrated promising convolution performance 4.…”

Section: Related Workmentioning

confidence: 99%

Optimizing Direct Convolutions on ARM Multi-Cores

Wang,

Yang,

Fang

et al. 2023

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Convolution kernels are widely seen in deep learning workloads and are often responsible for performance bottlenecks. Recent research has demonstrated that a direct convolution approach can outperform the traditional convolution implementation based on tensor-to-matrix conversions. However, existing approaches for direct convolution still have room for performance improvement. We present nDirect, a new direct convolution approach that targets ARM-based multi-core CPUs commonly found in smartphones and HPC systems. nDirect is designed to be compatible with the data layout formats used by mainstream deep learning frameworks but offers new optimizations for the computational kernel, data packing, and parallelization. We evaluate nDirect by applying it to representative convolution kernels and demonstrating its performance on four distinct ARM multi-core CPU platforms. We compare nDirect against state-of-the-art convolution optimization techniques. Experimental results show that nDirect gives the best overall performance across evaluation scenarios and platforms. CCS CONCEPTS• Computing methodologies → Machine learning; • Software and its engineering → Compilers.

show abstract