2019
DOI: 10.48550/arxiv.1902.09229
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Sanjeev Arora,
Hrishikesh Khandeparkar,
Mikhail Khodak
et al.

Abstract: Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically "similar" data points and "negative samples," the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. The current paper uses the term cont… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
140
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 96 publications
(150 citation statements)
references
References 8 publications
(11 reference statements)
1
140
1
Order By: Relevance
“…Implicit to many applications is the assumption that the anchor, positive, and negative samples have the same marginal distribution P mar . This property also holds for the recently proposed latent "class" modeling framework of (Arora et al, 2019) for contrastive unsupervised representation learning which has been adopted by several works, e.g., . Let P(P mar ) denote the set of joint distributions P having the form shown above with a common marginal distribution P mar for the anchor, positive, and negative samples.…”
Section: Unsupervised Contrastive Learningmentioning
confidence: 73%
See 1 more Smart Citation
“…Implicit to many applications is the assumption that the anchor, positive, and negative samples have the same marginal distribution P mar . This property also holds for the recently proposed latent "class" modeling framework of (Arora et al, 2019) for contrastive unsupervised representation learning which has been adopted by several works, e.g., . Let P(P mar ) denote the set of joint distributions P having the form shown above with a common marginal distribution P mar for the anchor, positive, and negative samples.…”
Section: Unsupervised Contrastive Learningmentioning
confidence: 73%
“…On the other hand the choice of negative samples, possibly conditioned on the given similar pair, remains an open design choice. It is well-known that this choice can theoretically (Arora et al, 2019; as well as empirically (Tschannen et al, 2019;Jin et al, 2018) affect the performance of contrastive learning.…”
Section: Introductionmentioning
confidence: 99%
“…Theoretical works on self-supervised learning. A recent line of theoretical works have studied selfsupervised learning (Arora et al, 2019;Tosh et al, 2021;HaoChen et al, 2021). In particular, it is shown that under conditional independence given the label and/or additional latent variables, representations learned by reconstruction-based self-supervised learning algorithms can achieve small errors in the downstream linear classification task (Arora et al, 2019;Tosh et al, 2021).…”
Section: Additional Related Workmentioning
confidence: 99%
“…A recent line of theoretical works have studied selfsupervised learning (Arora et al, 2019;Tosh et al, 2021;HaoChen et al, 2021). In particular, it is shown that under conditional independence given the label and/or additional latent variables, representations learned by reconstruction-based self-supervised learning algorithms can achieve small errors in the downstream linear classification task (Arora et al, 2019;Tosh et al, 2021). More closely related to our work is the recent result of HaoChen et al (2021) that analyzed contrastive learning without assuming conditional independence of positive pairs.…”
Section: Additional Related Workmentioning
confidence: 99%
“…'Bootstrap your Own Latent' (BYOL) Grill et al (2020) presents a new approach to self-supervision that is simpler and does not require negative samples for the loss function, which has often been the downfall of SimCLR Arora et al (2019). It uses two neural networks working in tandem to generate representations.…”
Section: Related Workmentioning
confidence: 99%