Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2021
DOI: 10.48550/arxiv.2110.05025
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-supervised Learning is More Robust to Dataset Imbalance

Abstract: Self-supervised learning (SSL) is a scalable way to learn general visual representations since it learns without labels. However, large-scale unlabeled datasets in the wild often have long-tailed label distributions, where we know little about the behavior of SSL. In this work, we systematically investigate self-supervised learning under dataset imbalance. First, we find out via extensive experiments that off-the-shelf selfsupervised representations are already more robust to class imbalance than supervised re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(33 citation statements)
references
References 45 publications
0
21
0
Order By: Relevance
“…The effect of unlabeled dataset characteristics, such as class imbalance, on downstream performance was studied in [43,74]. The authors demonstrate that self-supervised approaches are indeed more robust to source dataset imbalances (including long tailed distributions), thereby adding additional value to the utilization of self-supervision for initialization.…”
Section: Analysis Methods For Understanding Self-supervised Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…The effect of unlabeled dataset characteristics, such as class imbalance, on downstream performance was studied in [43,74]. The authors demonstrate that self-supervised approaches are indeed more robust to source dataset imbalances (including long tailed distributions), thereby adding additional value to the utilization of self-supervision for initialization.…”
Section: Analysis Methods For Understanding Self-supervised Approachesmentioning
confidence: 99%
“…Source dataset imbalance. As in [43,74] we first define the imbalance ratio 𝜌 as the ratio of the rarest class to the most frequent class. Therefore, 𝜌 = num_windows of the rarest class num_windows of the most frequent class ≤ 1.…”
Section: Effect Of the Dataset Imbalance On Performancementioning
confidence: 99%
“…On the theoretical front, there have been analyses on both masked predictions [Lee et al, 2020] and contrastive methods [Arora et al, 2019, Tosh et al, 2020a,b, Wang and Isola, 2020, HaoChen et al, 2021, Wen and Li, 2021 though with a focus on characterizing the quality of the learned features for downstream tasks , Wei et al, 2021. These approaches usually rely on quite strong assumptions to tie the self-supervised learning objective to the downstream tasks of interest.…”
Section: Related Workmentioning
confidence: 99%
“…There exists a few attempts (Liu et al, 2021;Jiang et al, 2021b;Zheng et al, 2021) towards self-supervised longtailed learning, which can be divide into two categories: loss-based or model-based methods. A classical solution in the first category, i.e., the focal loss (Lin et al, 2017), relies on the individual sample difficulty to rebalance the learning.…”
Section: Related Workmentioning
confidence: 99%
“…Existing works for self-supervised long-tailed learning are mainly from the loss perspective or the model perspective. The former relies on the loss reweighting, e.g., the focal loss in hard example mining (Lin et al, 2017) or SAM by means of the sharpness of the loss surface (Liu et al, 2021), to draw more attention on tail samples during training. However, the effectiveness of these methods is sensitive to and limited by the accuracy of the tail sample discovery.…”
Section: Introductionmentioning
confidence: 99%