Privileged Features Distillation at Taobao Recommendations

Chen, Xu; Li, Quan; Ge, Junfeng; Gao, Jinyang; Yang, Xiaoyong; Pei, Changhua; Sun, Fei; Wu, Jian; Sun, Hanxiao; Ou, Wenwu

doi:10.1145/3394486.3403309

Cited by 40 publications

(38 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…KD has been developed for various applications apart from model compression, e.g., intelligent label smoothing (Yuan et al 2020), self-distillation (Zhang and Sabuncu 2020). Still, most works in recommender systems adopt KD in a traditional way for model reduction, where teacher and student are differently sized models targeting the same task (Tang and Wang 2018;Xu et al 2020;Zhu et al 2020). Distinct from theirs or other works using KD in machine learning community, this paper serves as the first attempt to leverage KD to transfer knowledge across different ranking tasks.…”

Section: Preliminaries and Related Workmentioning

confidence: 99%

“…However, such asynchronous training procedure is not favorable for industrial applications such as online advertising. Instead, because of simplicity and easy maintenance, synchronous training procedure where teacher and student models are trained in an end-toend manner is more desirable as done in (Xu et al 2020;Anil et al 2018;Zhou et al 2018). In our framework, there are two sets of parameters for optimization, namely, parameters in MTL backbone for prediction (denoted as Θ) and parameters for calibration including P A , P B , Q A and Q B (denoted as Ω).…”

Section: Model Trainingmentioning

confidence: 99%

See 1 more Smart Citation

Cross-Task Knowledge Distillation in Multi-Task Recommendation

Yang¹,

Pan²,

Gao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Multi-task learning has been widely used in real-world recommenders to predict different types of user feedback. Most prior works focus on designing network architectures for bottom layers as a means to share the knowledge about input features representations. However, since they adopt task-specific binary labels as supervised signals for training, the knowledge about how to accurately rank items is not fully shared across tasks. In this paper, we aim to enhance knowledge transfer for multitask personalized recommendat optimization objectives. We propose a Cross-Task Knowledge Distillation (CrossDistil) framework in recommendation, which consists of three procedures. 1) Task Augmentation: We introduce auxiliary tasks with quadruplet loss functions to capture cross-task finegrained ranking information, which could avoid task conflicts by preserving the cross-task consistent knowledge; 2) Knowledge Distillation: We design a knowledge distillation approach based on augmented tasks for sharing ranking knowledge, where tasks' predictions are aligned with a calibration process; 3) Model Training: Teacher and student models are trained in an end-to-end manner, with a novel error correction mechanism to speed up model training and improve knowledge quality. Comprehensive experiments on a public dataset and our production dataset are carried out to verify the effectiveness of CrossDistil as well as the necessity of its key components.

show abstract

Section: Preliminaries and Related Workmentioning

confidence: 99%

Section: Model Trainingmentioning

confidence: 99%

Cross-Task Knowledge Distillation in Multi-Task Recommendation

Yang¹,

Pan²,

Gao³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…[17] considers other users' behaviors after the last historical behavior (but before the current time) of the target user as future information, which is not real future data. PFD [33] is the most related work, which adopts a KD block to model privileged features (i.e., important features of the current items that cannot be used in offline). However, PFD is not purposefully verified in future information.…”

Section: Related Workmentioning

confidence: 99%

“…However, there is no existing work that considers future information in general recommendation. Inspired by PFD [33], we extend the idea of feature-based knowledge distillation to future encoding, replacing the privileged features with our future features. For fair comparisons, we use the same common and future features as AFE.…”

Section: Competitorsmentioning

confidence: 99%

A Peep into the Future: Adversarial Future Encoding in Recommendation

Xie

Zhang

Wang

et al. 2022

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

Personalized recommendation often relies on user historical behaviors to provide items for users. It is intuitive that future information also contains essential messages as supplements to user historical behaviors. However, we cannot directly encode future information into models, since we are unable to get future information in online serving. In this work, we propose a novel adversarial future encoding (AFE) framework to make full use of informative future features in different types of recommendation models. Specifically, AFE contains a future-aware discriminator and a generator. The future-aware discriminator takes both common features and future features as inputs, working as a recommendation prophet to judge user-item pairs. In contrast, the generator is considered as a challenger, which generates items with only common features, aiming to confuse the future-aware prophet. The future-aware discriminator can inspire the generator (to be deployed online) to produce better results. We further conduct a multi-factor optimization to enable a fast and stable model convergence via the direct learning and knowledge distillation losses. Moreover, we have adopted AFE on both a list-wise RL-based ranking model and a point-wise ranking model to verify its universality. In experiments, we conduct sufficient evaluations on two large-scale datasets, achieving significant improvements on both offline and online evaluations. Currently, we have deployed AFE on a real-world system, affecting millions of users. The source code is in https://github.com/modriczhang/AFE.

show abstract

“…the physical coordinates in the ad the user clicked). This information, which some other authors have referred to as privileged information (Xu et al, 2020), can't be used as features in the prediction model. However, as we discuss later, it can be used in training the prediction models.…”

Section: Post-ranking Signalsmentioning

confidence: 99%

Challenges and approaches to privacy preserving post-click conversion prediction

O’Brien¹,

Thiagarajan²,

Das³

et al. 2022

Preprint

View full text Add to dashboard Cite

Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e. the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are often trained by observing individual user behavior, but, increasingly, regulatory and technical constraints are requiring privacy-preserving approaches. For example, major platforms are moving to restrict tracking individual user events across multiple applications, and governments around the world have shown steadily more interest in regulating the use of personal data. Instead of receiving data about individual user behavior, advertisers may receive privacy-preserving feedback, such as the number of installs of an advertised app that resulted from a group of users. In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective. We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone, and significantly reduces model degradation when no individual labels are available. Finally, we discuss future directions for research in this evolving area.

show abstract

Privileged Features Distillation at Taobao Recommendations

Cited by 40 publications

References 22 publications

Cross-Task Knowledge Distillation in Multi-Task Recommendation

Cross-Task Knowledge Distillation in Multi-Task Recommendation

A Peep into the Future: Adversarial Future Encoding in Recommendation

Challenges and approaches to privacy preserving post-click conversion prediction

Contact Info

Product

Resources

About