On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Wu, Jiancan; Wang, Xiang; Gao, Xingyu; Chen, Jiawei; Fu, Hongcheng; Qiu, Tianyu; He, Xiangnan

doi:10.48550/arxiv.2201.02327

Cited by 5 publications

(17 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In prior works [2][3][4]50], 𝑄 is often generic sampling distributions such as log-uniform and uniform sampling, which are shown to perform relatively well in general RS learning [44]. Specifically, using log-uniform sampling on the item set sorted by popularity gives the popular items a higher probability of being selected as negative samples.…”

Section: Sampled Softmax Lossmentioning

confidence: 99%

“…First, we raise doubts about the effectiveness of uniformly sampled softmax in multi-interest scenarios. While uniformly sampled softmax has been shown to be effective in training general recommendation systems [44], it falls short in multi-interest recommendation systems, such as ComiRec [2] and PIMIRec [4]. As illustrated in Figure 1 (a), it has significantly worse performance compared to full softmax within a reasonable sample size range (e.g., below a thousand).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Rethinking Multi-Interest Learning for Candidate Matching in Recommender Systems

Xie¹,

Gao²,

Zhou³

et al. 2023

Preprint

View full text Add to dashboard Cite

Existing research efforts for multi-interest candidate matching in recommender systems mainly focus on improving model architecture or incorporating additional information, neglecting the importance of training schemes. This work revisits the training framework and uncovers two major problems hindering the expressiveness of learned multi-interest representations. First, the current training objective (i.e., uniformly sampled softmax) fails to effectively train discriminative representations in a multi-interest learning scenario due to the severe increase in easy negative samples. Second, a routing collapse problem is observed where each learned interest may collapse to express information only from a single item, resulting in information loss. To address these issues, we propose the REMI framework, consisting of an Interest-aware Hard Negative mining strategy (IHN) and a Routing Regularization (RR) method. IHN emphasizes interest-aware hard negatives by proposing an ideal sampling distribution and developing a Monte-Carlo strategy for efficient approximation. RR prevents routing collapse by introducing a novel regularization term on the item-to-interest routing matrices. These two components enhance the learned multi-interest representations from both the optimization objective and the composition information. REMI is a general framework that can be readily applied to various existing multi-interest candidate matching methods. Experiments on three real-world datasets show our method can significantly improve state-of-the-art methods with easy implementation and negligible computational overhead. The source code will be released. CCS CONCEPTS• Information systems → Recommender systems.

show abstract

Section: Sampled Softmax Lossmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Rethinking Multi-Interest Learning for Candidate Matching in Recommender Systems

Xie¹,

Gao²,

Zhou³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…There are multiple choices of loss functions for training a recommendation model including pointwise loss (e.g., BCE [12,26], MSE [9,17]), pairwise loss(e.g., BPR [25]) and Softmax loss [34]. Recent work [34] finds Softmax loss could mitigate popularity bias, achieves great training stability, and aligns well with the ranking metric. It usually achieves better performance than others and thus attracts a surge of interest in recommendation.…”

Section: Preliminariesmentioning

confidence: 99%

“…1 a). Here we follow [34] and split items into ten groups in terms of item popularity. The larger group ID indicates the group contains more popular items.…”

Section: Empirical Analysismentioning

confidence: 99%

“…Temperature has exhibited its capability in numerous fields such as CV and NLP in particular with contrastive learning [3,8]. Motivated by the success in other areas, recommendation combined with contrastive learning has received scant attention in recent research literature [21,28,33,34,38]. Although normalizaiton and 𝜏 are heuristically used by a small amount of work, it still lacks comprehensive exploration in recommendation.…”

Section: Temperature and Normalizationmentioning

confidence: 99%

See 1 more Smart Citation

Adap-$τ$: Adaptively Modulating Embedding Magnitude for Recommendation

Chen,

Wu,

et al. 2023

Preprint

View full text Add to dashboard Cite

Recent years have witnessed the great successes of embeddingbased methods in recommender systems. Despite their decent performance, we argue one potential limitation of these methods -the embedding magnitude has not been explicitly modulated, which may aggravate popularity bias and training instability, hindering the model from making a good recommendation. It motivates us to leverage the embedding normalization in recommendation. By normalizing user/item embeddings to a specific value, we empirically observe impressive performance gains (9% on average) on four real-world datasets. Although encouraging, we also reveal a serious limitation when applying normalization in recommendation -the performance is highly sensitive to the choice of the temperature 𝜏 which controls the scale of the normalized embeddings.To fully foster the merits of the normalization while circumvent its limitation, this work studied on how to adaptively set the proper 𝜏. Towards this end, we first make a comprehensive analyses of 𝜏 to fully understand its role on recommendation. We then accordingly develop an adaptive fine-grained strategy Adap-𝜏 for the temperature with satisfying four desirable properties including adaptivity, personalized, efficiency and model-agnostic. Extensive experiments have been conducted to validate the effectiveness of the proposal.

show abstract