Proceedings of the 13th ACM Conference on Recommender Systems 2019
DOI: 10.1145/3298689.3347034
|View full text |Cite
|
Sign up to set email alerts
|

Relaxed softmax for PU learning

Abstract: In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation, two fields that fall into the framework of learning from Positive and Unlabeled data.In this paper, we stress the different drawbacks of the current family of softmax losses and sampling schemes when applied in a Positive and Unlabeled learning setup. We propose both a Relaxe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 26 publications
(28 reference statements)
0
5
0
Order By: Relevance
“…predicting missing elements of the MovieLens dataset (Harper and Konstan 2015)). Maximum likelihood estimation also suffers from a computational cost that scales in O(P ) but the problem has a different mathematical form to policy learning and methods developed in the maximum likelihood context (Tanielian and Vasile 2019;Gutmann and Hyvärinen 2010;Rendle et al 2009) cannot be applied to policy learning.…”
Section: Related Workmentioning
confidence: 99%
“…predicting missing elements of the MovieLens dataset (Harper and Konstan 2015)). Maximum likelihood estimation also suffers from a computational cost that scales in O(P ) but the problem has a different mathematical form to policy learning and methods developed in the maximum likelihood context (Tanielian and Vasile 2019;Gutmann and Hyvärinen 2010;Rendle et al 2009) cannot be applied to policy learning.…”
Section: Related Workmentioning
confidence: 99%
“…Naturally, there is no necessity for OVR classification for DNNs, since they already support multi-class classification by design. However, there are common situations where OVR becomes relevant, e.g., if faced with only positively labeled samples and all remaining samples with potentially unknown sources are assigned to a single negative class [16,31,66] or if the goal, as in this paper, is to filter normal samples from abnormal ones [6]. As motivated previously, vanilla MLPs are unsuitable in this case due to their infinite open space risk and consequential robustness deficiencies toward outliers [6].…”
Section: Related Workmentioning
confidence: 99%
“…For quantifying the distance between the true and estimated distributions, Kullback-Leibler divergence (KLD) [35] is often utilised. Tanielian et al [56] recently proposed a DE approach based on the maximum likelihood estimation (MLE) of softmax density functions. However, their MLE formulation leads to intractable log-partition log 𝑖 ∈I exp(𝑓 𝑢 (𝑖)) in the loglikelihood function, which makes SGD-based optimisation difficult for large-scale settings.…”
Section: Paradigms Of Personalised Rankingmentioning
confidence: 99%