2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2021
DOI: 10.1109/ispass51385.2021.00033
|View full text |Cite
|
Sign up to set email alerts
|

Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Abstract: Deep learning recommendation models have grown to the terabyte scale. Traditional serving schemes-that load entire models to a single server-are unable to support this scale. One approach to support this scale is with distributed serving, or distributed inference, which divides the memory requirements of a single large model across multiple servers.This work is a first-step for the systems research community to develop novel model-serving solutions, given the huge system design space. Large-scale deep recommen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 20 publications
(25 citation statements)
references
References 20 publications
0
25
0
Order By: Relevance
“…Baselines: To evaluate the efficacy of RecShard, we compare the performance of EMB operators under RecShard's throughput optimized sharding strategy with sharding schemes from prior work on production DLRM training systems [1,26,31]. State-of-the art sharding schemes typically follow a two-step approach.…”
Section: Experimental Methodologymentioning
confidence: 99%
See 4 more Smart Citations
“…Baselines: To evaluate the efficacy of RecShard, we compare the performance of EMB operators under RecShard's throughput optimized sharding strategy with sharding schemes from prior work on production DLRM training systems [1,26,31]. State-of-the art sharding schemes typically follow a two-step approach.…”
Section: Experimental Methodologymentioning
confidence: 99%
“…• Size [1,26]: An EMB's cost is the product of its hash size and its embedding dimension (latent vector length). • Lookup [1,26]: An EMB's cost is the product of its average pooling factor and its embedding dimension.…”
Section: Experimental Methodologymentioning
confidence: 99%
See 3 more Smart Citations