2018
DOI: 10.48550/arxiv.1811.09886
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Abstract: The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper we provide detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high-performance optimizations targeting existing systems, point out their limitations and make suggestions for the future general-purpose/accelerated inference hardware. Also, we highlight the need for better co-design … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
54
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 41 publications
(58 citation statements)
references
References 47 publications
0
54
0
1
Order By: Relevance
“…Simulations: N-body, raytracing, and Monte-Carlo [4,97,112,107]; and 7. Machine learning: various supervised and unsupervised learning algorithms are implemented using GEMM kernels Deep learning utilizes GEMM kernels for convolution layers [78,66,22,75,102,8,89,103,108]. This thesis's motivation lies in improving the performance of SpGEMM kernels, which will have a significant impact on many important applications.…”
Section: Applicationsmentioning
confidence: 99%
“…Simulations: N-body, raytracing, and Monte-Carlo [4,97,112,107]; and 7. Machine learning: various supervised and unsupervised learning algorithms are implemented using GEMM kernels Deep learning utilizes GEMM kernels for convolution layers [78,66,22,75,102,8,89,103,108]. This thesis's motivation lies in improving the performance of SpGEMM kernels, which will have a significant impact on many important applications.…”
Section: Applicationsmentioning
confidence: 99%
“…While DNNs have demonstrated its effectiveness in various internet application domains, the cost of using DNNs for web-scale real-time online inference becomes the major burden for most companies to adopt the techniques [11,17] On the one hand, the time consumption (e.g., latency) of the online service is critical for user experience [5] and can influence the long term retention rate [4]. On the other hand, the resource consumption (e.g., hardware and energy usages) of supporting DNNs would request significant serving infrastructure investment (e.g., high-performance clusters) with higher power consumption and sometimes makes the systems design, implementation and operation over-budget [29].…”
Section: Introductionmentioning
confidence: 99%
“…With the number of categories as large as tens of millions for each feature, embedding tables can take up over 99.9% of the total memory. Namely, memory footprint can be multiple gigabytes or even terabytes [6,7,8]. In practice, deploying these large models often requires the model to be decomposed and distributed across different machines due to memory capacity restrictions [9].…”
Section: Introductionmentioning
confidence: 99%
“…7 shows we only need around 100K samples for the Criteo dataset out of 4.5M samples. Below we discuss a few considera-tions in relation to LMA applied to DLRM model.…”
mentioning
confidence: 99%