Maxim Naumov scite author profile

The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recommendation. To facilitate research and to advance the understanding of these workloads, this paper presents a set of real-world, productionscale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct indepth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inferences can drastically improve latency-bounded throughput, and the diverse composition of recommendation models leads to different optimization strategies.Preprint. Under submission.

show abstract

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Liu

Gupta

Cho

et al. 2020

139

View full text Add to dashboard Cite

AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods

Naumov¹,

Arsaev²,

Castonguay³

et al. 2015

SIAM J. Sci. Comput.

111

View full text Add to dashboard Cite

The solution of large sparse linear systems arises in many applications, such as computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are often so large that they require large scale distributed parallel computing to obtain the solution of interest in a reasonable time. In this paper we discuss the design and implementation of the AmgX library, which provides drop-in GPU acceleration of distributed algebraic multigrid (AMG) and preconditioned iterative methods. The AmgX library implements both classical and aggregation-based AMG methods with different selector and interpolation strategies, along with a variety of smoothers and preconditioners, including block-Jacobi, Gauss-Seidel, and incomplete-LU factorization. The library contains many of the standard and flexible preconditioned Krylov subspace iterative methods, which can be combined with any of the available multigrid methods or simpler preconditioners. The parallelism in the aggregation scheme exploits parallel graph matching techniques, while the smoothers and preconditioners often rely on parallel graph coloring algorithms. The AMG algorithm implemented in the AmgX library achieves 2-5× speedup on a single GPU against a competitive implementation on the CPU. As will be shown in the numerical experiments section, both setup and solve phases scale well across multiple nodes, sustaining this performance advantage.

show abstract

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Park¹,

Naumov²,

Basu³

et al. 2018

Preprint

View full text Add to dashboard Cite

The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper we provide detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high-performance optimizations targeting existing systems, point out their limitations and make suggestions for the future general-purpose/accelerated inference hardware. Also, we highlight the need for better co-design of algorithms, numerics and computing platforms to address the challenges of workloads often run in data centers.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maxim Naumov

Atomistic Simulation of Realistically Sized Nanodevices Using NEMO 3-D—Part I: Models and Benchmarks

The Architectural Implications of Facebook's DNN-Based Personalized Recommendation

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Contact Info

Product

Resources

About