Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

Zhang, Chaoning; Zhang, Kang; Pham, Trung X.; Niu, Axi; Qiao, Zhinan; Yoo, Chang D.; Kweon, In So

doi:10.1109/cvpr52688.2022.01404

Cited by 22 publications

(13 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Prior works tend to believe that asymmetric designs are necessary for avoiding complete feature collapse (Zhang et al, 2022a), while we show that a fully symmetric architecture, dubbed SymSimSiam (Symmetric Simple Siamese network), can also avoid complete collapse. Specifically, we simply align the positive pair (x, x + ) with a symmetric alignment loss,…”

Section: Asymmetry Is the Key To Alleviate Dimensional Collapsementioning

confidence: 68%

“…Some existing works are proposed to understand some specific non-contrastive techniques, mostly focusing on the predictor head proposed by BYOL (Grill et al, 2020). From an empirical side, Chen & He (2021) think that the predictor helps approximate the expectation over augmentations, and Zhang et al (2022a) take a center-residual decomposition of representations for analyzing the collapse. From a theoretical perspective, Tian et al (2021) analyze the dynamics of predictor weights under simple linear networks, and Wen & Li (2022) obtain optimization guarantees for two-layer nonlinear networks.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism

Zhuo¹,

Wang²,

Ma³

et al. 2023

Preprint

View full text Add to dashboard Cite

Recently, a variety of methods under the name of non-contrastive learning (like BYOL, SimSiam, SwAV, DINO) show that when equipped with some asymmetric architectural designs, aligning positive pairs alone is sufficient to attain good performance in self-supervised visual learning. Despite some understandings of some specific modules (like the predictor in BYOL), there is yet no unified theoretical understanding of how these seemingly different asymmetric designs can all avoid feature collapse, particularly considering methods that also work without the predictor (like DINO). In this work, we propose a unified theoretical understanding for existing variants of non-contrastive learning. Our theory named Rank Differential Mechanism (RDM) shows that all these asymmetric designs create a consistent rank difference in their dual-branch output features. This rank difference will provably lead to an improvement of effective dimensionality and alleviate either complete or dimensional feature collapse. Different from previous theories, our RDM theory is applicable to different asymmetric designs (with and without the predictor), and thus can serve as a unified understanding of existing non-contrastive learning methods. Besides, our RDM theory also provides practical guidelines for designing many new non-contrastive variants. We show that these variants indeed achieve comparable performance to existing methods on benchmark datasets, and some of them even outperform the baselines. Our code is available at https://github.com/PKU-ML/ Rank-Differential-Mechanism.

show abstract

Section: Asymmetry Is the Key To Alleviate Dimensional Collapsementioning

confidence: 68%

Section: Introductionmentioning

confidence: 99%

Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism

Zhuo¹,

Wang²,

Ma³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This effect can be present a) when learning focuses only on few features and/or b) when the covariance structure in the data is insufficiently extracted. Explaining away can be caused by saturation of the InfoNCE objective [2, 11, 12]. To ameliorate these drawbacks, CLOOB [2] has introduced the InfoLOOB objective together with Hopfield networks as a promising method for contrastive learning.…”

Section: Methodsmentioning

confidence: 99%

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures

Sanchez-Fernandez

Rumetshofer

Hochreiter

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…They consider temperature as a measure of embedding confidence and propose temperature as uncertainty. Zhang et al [54] adopt dual temperature in a contrastive InfoNCE for realizing independent control of two hardness-aware sensitiveness. Previous temperature analysis works mainly focus on the penalty's unevenness of negative samples within an anchor or the sum of penalties of different anchors within a training batch.…”

Section: Related Workmentioning

confidence: 99%

“…When the temperature is fixed, the gradient's magnitude with respect to a positive sample is equal to the sum of gradients with respect to all negative samples. Prior works of temperature analysis mainly focus on the penalty's unevenness of negative samples within an anchor [48], or the sum of penalties of different anchors within a training batch [54]. Differently, we pay attention to the proportion of penalties between the positive sample and negative samples.…”

Section: Adaptive Contrastivementioning

confidence: 99%

A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning

Zhang¹,

Tang²,

Dong³

et al. 2023

Preprint

View full text Add to dashboard Cite

Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

Cited by 22 publications

References 17 publications

Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism

Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures

A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning

Contact Info

Product

Resources

About