Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Shi, Hao-Jun Michael; Mudigere, Dheevatsa; Naumov, Maxim; Yang, Jiyan

doi:10.1145/3394486.3403059

Cited by 46 publications

(26 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our test set results of 0.4442 and 0.4454 for the lowest loss and most efficient embedding cardinality search architectures respectively are significantly in excess of the ∼0.447 (estimated from graphs by counting pixels) reported in figure 5 of [26] as their DLRM baseline. Our latter efficient result also uses ∼12× fewer parameters than the 5.4 × 10 8 reported for that baseline.…”

Section: Comparisons To Prior Workmentioning

confidence: 60%

“…Our best result for embedding cardinality search compresses the total size of embedding tables 15.14× with a relative 0.0012 increase in loss, demonstrating the promise of our approach (see sections VI-C3 and VIII-C). Our approach discovered recommendation models that beat the state-of-the-art in terms of logloss with significantly fewer parameters (0.4442 vs. 0.447 of [26]; see VIII-D). Moreover, our approach discovered this model using 52× less computational effort (see section VIII-D).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Differentiable NAS Framework and Application to Ads CTR Prediction

Krishna¹,

Kalaiah²,

Wu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Neural architecture search (NAS) methods aim to automatically find the optimal deep neural network (DNN) architecture as measured by a given objective function, typically some combination of task accuracy and inference efficiency. For many areas, such as computer vision and natural language processing, this is a critical, yet still time consuming process. New NAS methods have recently made progress in improving the efficiency of this process. We implement an extensible and modular framework for Differentiable Neural Architecture Search (DNAS) [1] to help solve this problem. We include an overview of the major components of our codebase and how they interact, as well as a section on implementing extensions to it (including a sample), in order to help users adopt our framework for their applications across different categories of deep learning models. To assess the capabilities of our methodology and implementation, we apply DNAS to the problem of ads click-through rate (CTR) prediction, arguably the highest-value and most worked on AI problem at hyperscalers today. We develop and tailor novel search spaces to a Deep Learning Recommendation Model (DLRM) backbone for CTR prediction, and report state-of-the-art results on the Criteo Kaggle CTR prediction dataset.

show abstract

Section: Comparisons To Prior Workmentioning

confidence: 60%

Section: Introductionmentioning

confidence: 99%

Differentiable NAS Framework and Application to Ads CTR Prediction

Krishna¹,

Kalaiah²,

Wu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…They ensure that the most frequent ones will be assigned a unique embedding (zero collisions for those) and the rest will be hashed to shared embeddings using two hash functions (double hashing). In a recent work, Shi et al [21] create a unique embedding for each category by composing shared entries from multiple smaller embedding tables. Again in the recommendation domain, Kang et al [8] propose DHE that replaces one-hot encodings with dense vectors from multiple hash functions.…”

Section: Related Workmentioning

confidence: 99%

Position-based Hash Embeddings For Scaling Graph Neural Networks

Kalantzi¹,

Karypis²

2021

Preprint

View full text Add to dashboard Cite

Graph Neural Networks (GNNs) bring the power of deep representation learning to graph and relational data and achieve state-of-the-art performance in many applications. GNNs compute node representations by taking into account the topology of the node's ego-network and the features of the ego-network's nodes. When the nodes do not have high-quality features, GNNs learn an embedding layer to compute node embeddings and use them as input features. However, the size of the embedding layer is linear to the product of the number of nodes in the graph and the dimensionality of the embedding and does not scale to big data and graphs with hundreds of millions of nodes. To reduce the memory associated with this embedding layer, hashing-based approaches, commonly used in applications like NLP and recommender systems, can potentially be used. However, a direct application of these ideas fails to exploit the fact that in many real-world graphs, nodes that are topologically close will tend to be related to each other (homophily) and as such their representations will be similar.In this work, we present approaches that take advantage of the nodes' position in the graph to dramatically reduce the memory required, with minimal if any degradation in the quality of the resulting GNN model. Our approaches decompose a node's embedding into two components: a position-specific component and a node-specific component. The position-specific component models homophily and the node-specific component models the node-to-node variation. Extensive experiments using different datasets and GNN models show that our methods are able to reduce the memory requirements by 88% to 97% while achieving, in nearly all cases, better classification accuracy than other competing approaches, including the full embeddings.

show abstract

“…To improve content personalization, recommendation models are growing rapidly in size and complexity [39,58,59,60]. Tackling the growing model sizes, researchers have proposed techniques to compress embedding tables while preserving accuracy [12,14,46,52]. Alternatively, one can decompose large monolithic models into multi-stage pipelines.…”

Section: Related Workmentioning

confidence: 99%

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Gupta¹,

Hsia²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAccel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Compared to prior-art and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3× and 6×.

show abstract

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Cited by 46 publications

References 21 publications

Differentiable NAS Framework and Application to Ads CTR Prediction

Differentiable NAS Framework and Application to Ads CTR Prediction

Position-based Hash Embeddings For Scaling Graph Neural Networks

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Contact Info

Product

Resources

About