Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2020
DOI: 10.1145/3394486.3403309
|View full text |Cite
|
Sign up to set email alerts
|

Privileged Features Distillation at Taobao Recommendations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(38 citation statements)
references
References 22 publications
0
38
0
Order By: Relevance
“…KD has been developed for various applications apart from model compression, e.g., intelligent label smoothing (Yuan et al 2020), self-distillation (Zhang and Sabuncu 2020). Still, most works in recommender systems adopt KD in a traditional way for model reduction, where teacher and student are differently sized models targeting the same task (Tang and Wang 2018;Xu et al 2020;Zhu et al 2020). Distinct from theirs or other works using KD in machine learning community, this paper serves as the first attempt to leverage KD to transfer knowledge across different ranking tasks.…”
Section: Preliminaries and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…KD has been developed for various applications apart from model compression, e.g., intelligent label smoothing (Yuan et al 2020), self-distillation (Zhang and Sabuncu 2020). Still, most works in recommender systems adopt KD in a traditional way for model reduction, where teacher and student are differently sized models targeting the same task (Tang and Wang 2018;Xu et al 2020;Zhu et al 2020). Distinct from theirs or other works using KD in machine learning community, this paper serves as the first attempt to leverage KD to transfer knowledge across different ranking tasks.…”
Section: Preliminaries and Related Workmentioning
confidence: 99%
“…However, such asynchronous training procedure is not favorable for industrial applications such as online advertising. Instead, because of simplicity and easy maintenance, synchronous training procedure where teacher and student models are trained in an end-toend manner is more desirable as done in (Xu et al 2020;Anil et al 2018;Zhou et al 2018). In our framework, there are two sets of parameters for optimization, namely, parameters in MTL backbone for prediction (denoted as Θ) and parameters for calibration including P A , P B , Q A and Q B (denoted as Ω).…”
Section: Model Trainingmentioning
confidence: 99%
“…[17] considers other users' behaviors after the last historical behavior (but before the current time) of the target user as future information, which is not real future data. PFD [33] is the most related work, which adopts a KD block to model privileged features (i.e., important features of the current items that cannot be used in offline). However, PFD is not purposefully verified in future information.…”
Section: Related Workmentioning
confidence: 99%
“…However, there is no existing work that considers future information in general recommendation. Inspired by PFD [33], we extend the idea of feature-based knowledge distillation to future encoding, replacing the privileged features with our future features. For fair comparisons, we use the same common and future features as AFE.…”
Section: Competitorsmentioning
confidence: 99%
“…the physical coordinates in the ad the user clicked). This information, which some other authors have referred to as privileged information (Xu et al, 2020), can't be used as features in the prediction model. However, as we discuss later, it can be used in training the prediction models.…”
Section: Post-ranking Signalsmentioning
confidence: 99%