CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUs

Miao, Xupeng; Ma, Lei; Yang, Zhi; Shao, Yingxia; Cui, Bin; Yu, Lele; Jiang, Jiawei

doi:10.1109/tkde.2020.3038109

Cited by 9 publications

(3 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 3 illustrates the skew distribution of embedding update frequency on some popular workloads, including clickthrough rate prediction (i.e., Criteo), citation network (i.e., ogbnmag), and product co-purchasing network (i.e., Amazon). The top Existing research provides evidence that parameter updates from various embedding models exhibit a universal skewed distribution [35], such as recommendation models [8,24,52], LDA topic models [20,25,47,48] and graph learning models [33,40].…”

Section: Problems and Opportunitiesmentioning

confidence: 99%

See 1 more Smart Citation

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

Miao,

Zhang,

Shi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Embedding models have been an effective learning paradigm for high-dimensional data. However, one open issue of embedding models is that their representations (latent factors) often result in large parameter space. We observe that existing distributed training frameworks face a scalability issue of embedding models since updating and retrieving the shared embedding parameters from servers usually dominates the training cycle. In this paper, we propose HET, a new system framework that significantly improves the scalability of huge embedding model training. We embrace skewed popularity distributions of embeddings as a performance opportunity and leverage it to address the communication bottleneck with an embedding cache. To ensure consistency across the caches, we incorporate a new consistency model into HET design, which provides fine-grained consistency guarantees on a per-embedding basis. Compared to previous work that only allows staleness for read operations, HET also utilizes staleness for write operations. Evaluations on six representative tasks show that HET achieves up to 88% embedding communication reductions and up to 20.68× performance speedup over the state-of-the-art baselines.

show abstract

Section: Problems and Opportunitiesmentioning

confidence: 99%

“…Similar to TensorFlow [7], we use a static computation graph abstraction to organize all the operations in HET. All operators implemented by GPU kernels are scheduled into the GPU stream [35]. These operators will be launched and executed asynchronously to avoid blocking the CPU execution.…”

Section: Asynchronous Communication Invocationmentioning

confidence: 99%

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

Miao,

Zhang,

Shi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Then a ranking model predicts a score for each candidate item and select the top items based on the estimated scores. This two-step procedure is widely adopted in the large scale industry recommender systems owing to its scalability and fast inference performance [1,2,3,4,5,6,7,8,9]. In this paper, we focus on the candidate generation stage [10], which is usually referred to as the top-N recommendation [11] in the academic area.…”

Section: Introductionmentioning

confidence: 99%

Explore User Neighborhood for Real-time E-commerce Recommendation

Xie

Sun

Yang

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recommender systems play a vital role in modern online services, such as Amazon and Taobao. Traditional personalized methods, which focus on user-item (UI) relations, have been widely applied in industrial settings, owing to their efficiency and effectiveness. Despite their success, we argue that these approaches ignore local information hidden in similar users. To tackle this problem, user-based methods exploit similar user relations to make recommendations in a local perspective. Nevertheless, traditional user-based methods, like userKNN and matrix factorization, are intractable to be deployed in the real-time applications since such transductive models have to be recomputed or retrained with any new interaction. To overcome this challenge, we propose a framework called self-complementary collaborative filtering (SCCF) which can make recommendations with both global and local information in real time. On the one hand, it utilizes UI relations and user neighborhood to capture both global and local information. On the other hand, it can identify similar users for each user in real time by inferring user representations on the fly with an inductive model. The proposed framework can be seamlessly incorporated into existing inductive UI approach and benefit from user neighborhood with little additional computation. It is also the first attempt to apply user-based methods in real-time settings. The effectiveness and efficiency of SCCF are demonstrated through extensive offline experiments on four public datasets, as well as a large scale online A/B test in Taobao.

show abstract