A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Arora, Sanjeev; Khandeparkar, Hrishikesh; Khodak, Mikhail; Plevrakis, Orestis; Saunshi, Nikunj

doi:10.48550/arxiv.1902.09229

Cited by 96 publications

(150 citation statements)

References 8 publications

(11 reference statements)

Supporting

Mentioning

140

Contrasting

Order By: Relevance

“…Implicit to many applications is the assumption that the anchor, positive, and negative samples have the same marginal distribution P mar . This property also holds for the recently proposed latent "class" modeling framework of (Arora et al, 2019) for contrastive unsupervised representation learning which has been adopted by several works, e.g., . Let P(P mar ) denote the set of joint distributions P having the form shown above with a common marginal distribution P mar for the anchor, positive, and negative samples.…”

Section: Unsupervised Contrastive Learningmentioning

confidence: 73%

See 1 more Smart Citation

Hard Negative Sampling via Regularized Optimal Transport for Contrastive Representation Learning

Jiang¹,

Ishwar²,

Aeron

2021

Preprint

View full text Add to dashboard Cite

We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning. We analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss over all couplings (joint distributions between positive and negative samples subject to marginal constraints) and prove that the resulting min-max optimum representation will be degenerate. This provides the first theoretical justification for incorporating additional regularization constraints on the couplings. We re-interpret the min-max problem through the lens of Optimal Transport theory and utilize regularized transport couplings to control the degree of hardness of negative examples. We demonstrate that the state-of-the-art hard negative sampling distributions that were recently proposed are a special case corresponding to entropic regularization of the coupling.

show abstract

Section: Unsupervised Contrastive Learningmentioning

confidence: 73%

“…On the other hand the choice of negative samples, possibly conditioned on the given similar pair, remains an open design choice. It is well-known that this choice can theoretically (Arora et al, 2019; as well as empirically (Tschannen et al, 2019;Jin et al, 2018) affect the performance of contrastive learning.…”

Section: Introductionmentioning

confidence: 99%

Hard Negative Sampling via Regularized Optimal Transport for Contrastive Representation Learning

Jiang¹,

Ishwar²,

Aeron

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Theoretical works on self-supervised learning. A recent line of theoretical works have studied selfsupervised learning (Arora et al, 2019;Tosh et al, 2021;HaoChen et al, 2021). In particular, it is shown that under conditional independence given the label and/or additional latent variables, representations learned by reconstruction-based self-supervised learning algorithms can achieve small errors in the downstream linear classification task (Arora et al, 2019;Tosh et al, 2021).…”

Section: Additional Related Workmentioning

confidence: 99%

“…A recent line of theoretical works have studied selfsupervised learning (Arora et al, 2019;Tosh et al, 2021;HaoChen et al, 2021). In particular, it is shown that under conditional independence given the label and/or additional latent variables, representations learned by reconstruction-based self-supervised learning algorithms can achieve small errors in the downstream linear classification task (Arora et al, 2019;Tosh et al, 2021). More closely related to our work is the recent result of HaoChen et al (2021) that analyzed contrastive learning without assuming conditional independence of positive pairs.…”

Section: Additional Related Workmentioning

confidence: 99%

Investigating Why Contrastive Learning Benefits Robustness Against Label Noise

Xue¹,

Whitecross²,

Mirzasoleiman³

2022

Preprint

View full text Add to dashboard Cite

Self-supervised contrastive learning has recently been shown to be very effective in preventing deep networks from overfitting noisy labels. Despite its empirical success, the theoretical understanding of the effect of contrastive learning on boosting robustness is very limited. In this work, we rigorously prove that the representation matrix learned by contrastive learning boosts robustness, by having: (i) one prominent singular value corresponding to every sub-class in the data, and remaining significantly smaller singular values; and (ii) a large alignment between the prominent singular vector and the clean labels of each subclass. The above properties allow a linear layer trained on the representations to quickly learn the clean labels, and prevent it from overfitting the noise for a large number of training iterations. We further show that the low-rank structure of the Jacobian of deep networks pre-trained with contrastive learning allows them to achieve a superior performance initially, when fine-tuned on noisy labels. Finally, we demonstrate that the initial robustness provided by contrastive learning enables robust training methods to achieve stateof-the-art performance under extreme noise levels, e.g., an average of 27.18% and 15.58% increase in accuracy on CIFAR-10 and CIFAR-100 with 80% symmetric noisy labels, and 4.11% increase in accuracy on WebVision. Noise Type Sym Asym Sym Asym Noise Ratio 20 50 80 40 20 50 80 40 F-correction 85.1 ± 0.4 76.0 ± 0.2 34.8 ± 4.5 83.6 ± 2.2 55.8 ± 0.5 43.3 ± 0.7 − 42.3 ± 0.7 Decoupling 86.7 ± 0.3 79.3 ± 0.6 36.9 ± 4.6 75.3 ± 0.8 57.6 ± 0.5 45.7 ± 0.4 − 43.1 ± 0.4 Co-teaching 89.1 ± 0.3 82.1 ± 0.6 16.2 ± 3.2 84.6 ± 2.8 64.0 ± 0.3 52.3 ± 0.4 − 47.7 ± 1.2 MentorNet 88.4 ± 0.5 77.1 ± 0.4 28.9 ± 2.3 77.3 ± 0.8 63.0 ± 0.4 46.4 ± 0.4 − 42.4 ± 0.5 D2L 86.1 ± 0.4 67.4 ± 3.6 10.0 ± 0.1 85.6 ± 1.2 12.5 ± 4.2 5.6 ± 5.4 − 14.1 ± 5.8 INCV 89.7 ± 0.2 84.8 ± 0.3 52.3 ± 3.5 86.0 ± 0.5 60.2 ± 0.2 53.1 ± 0.4 − 50.7 ± 0.2 T-Revision 79.3 ± 0.5 78.5 ± 0.6 36.2 ± 1.6 76.3 ± 0.8 52.4 ± 0.3 37.6 ± 0.3 − 32.3 ± 0.4 L DMI 84.3 ± 0.4 78.8 ± 0.5 20.9 ± 2.2 84.8 ± 0.7 56.8 ± 0.4 42.2 ± 0.5 − 39.5 ± 0.4 Crust * 85.3 ± 0.5 86.8 ± 0.3 33.8 ± 1.3 76.7 ± 3.4 62.9 ± 0.3 55.5 ± 1.1 18.5 ± 0.8 52.5 ± 0.4 Mixup 89.7 ± 0.7 84.5 ± 0.3 40.7 ± 1.1 86.3 ± 0.1 64.0 ± 0.4 53.4 ± 0.5 15.1 ± 0.1 54.4 ± 2.0 ELR * 90.6 ± 0.6 87.7 ± 1.0 69.5 ± 5.0 86.6 ± 2.9 63.6 ± 1.7 52.5 ± 4.2 23.4 ± 1.9 59.7 ± 0.1 CL+E2E * 88.8 ± 0.5 82.8 ± 0.2 72.0 ± 0.3 83.5 ± 0.5 63.5 ± 0.2 56.1 ± 0.3 36.7 ± 0.3 52.4 ± 0.2 CL+Crust * 86.5 ± 0.7 87.6 ± 0.3 77.9 ± 0.3 85.9 ± 0.4 63.0 ± 0.8 58.3 ± 0.1 34.8 ± 1.5 53.3 ± 0.7 CL+Mixup * 90.8 ± 0.2 84.6 ± 0.4 74.8 ± 0.3 87.5 ± 1.3 64.4 ± 0.4 55.5 ± 0.1 30.3 ± 0.4 55.5 ± 0.8 CL+ELR * 91.3 ± 0.0 89.1 ± 0.1 77.7 ± 0.2 89.7 ± 0.3 64.7 ± 0.2 55.6 ± 0.2 35.9 ± 0.3 63.6 ± 0.1

show abstract

“…'Bootstrap your Own Latent' (BYOL) Grill et al (2020) presents a new approach to self-supervision that is simpler and does not require negative samples for the loss function, which has often been the downfall of SimCLR Arora et al (2019). It uses two neural networks working in tandem to generate representations.…”

Section: Related Workmentioning

confidence: 99%

Evaluating Contrastive Learning on Wearable Timeseries for Downstream Clinical Outcomes

Shah¹,

Spathis²,

Tang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Vast quantities of person-generated health data (wearables) are collected but the process of annotating to feed to machine learning models is impractical. This paper discusses ways in which self-supervised approaches that use contrastive losses, such as SimCLR and BYOL, previously applied to the vision domain, can be applied to high-dimensional health signals for downstream classification tasks of various diseases spanning sleep, heart, and metabolic conditions. To this end, we adapt the data augmentation step and the overall architecture to suit the temporal nature of the data (wearable traces) and evaluate on 5 downstream tasks by comparing other state-of-the-art methods including supervised learning and an adversarial unsupervised representation learning method. We show that Sim-CLR outperforms the adversarial method and a fully-supervised method in the majority of the downstream evaluation tasks, and that all self-supervised methods outperform the fullysupervised methods. This work provides a comprehensive benchmark for contrastive methods applied to the wearable time-series domain, showing the promise of task-agnostic representations for downstream clinical outcomes.

show abstract

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Cited by 96 publications

References 8 publications

Hard Negative Sampling via Regularized Optimal Transport for Contrastive Representation Learning

Hard Negative Sampling via Regularized Optimal Transport for Contrastive Representation Learning

Investigating Why Contrastive Learning Benefits Robustness Against Label Noise

Evaluating Contrastive Learning on Wearable Timeseries for Downstream Clinical Outcomes

Contact Info

Product

Resources

About