2020
DOI: 10.1007/978-3-030-58621-8_45
|View full text |Cite
|
Sign up to set email alerts
|

Contrastive Multiview Coding

Abstract: Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, viewed by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We hypothesize that a powerful representation is one that models view-invariant factors. Based on this hypothesis, we investigate a contrastive codi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

9
915
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,288 publications
(1,044 citation statements)
references
References 66 publications
9
915
1
Order By: Relevance
“…To encourage the encoder to learn a richer embedding, and to mitigate the need to train separate critics for each position of k , we modify the bilinear critic used in Oord et al (2018), and instead use a parameterless dot product critic for f (Chen et al, 2020). Rather than using a memory bank (Wu et al, 2018; Tian et al, 2019; He et al, 2019), we draw “fake” samples from p ( z ) and p ( c ) using other z t + k and c t from other samples in the same batch (Chen et al, 2020). That is, the diagonal of the output of the dot-product critic is the “correct pairing” of and at a given t and k and the softmax is computed using all off-diagonal entries to draw the N −1 “fake” samples from the noise distribution p ( z ′).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To encourage the encoder to learn a richer embedding, and to mitigate the need to train separate critics for each position of k , we modify the bilinear critic used in Oord et al (2018), and instead use a parameterless dot product critic for f (Chen et al, 2020). Rather than using a memory bank (Wu et al, 2018; Tian et al, 2019; He et al, 2019), we draw “fake” samples from p ( z ) and p ( c ) using other z t + k and c t from other samples in the same batch (Chen et al, 2020). That is, the diagonal of the output of the dot-product critic is the “correct pairing” of and at a given t and k and the softmax is computed using all off-diagonal entries to draw the N −1 “fake” samples from the noise distribution p ( z ′).…”
Section: Methodsmentioning
confidence: 99%
“…It is possible to use a variational approach to estimate a bound on the mutual information between continuous, high-dimensional quantities (Donsker & Varadhan, 1983; Nguyen et al, 2010; Alemi et al, 2016; Belghazi et al, 2018; Oord et al, 2018; Poole et al, 2019). Recent works capture this intuition to yield self-supervised embeddings in the modalities of imaging (Oord et al, 2018; Hjelm et al, 2018; Bachman et al, 2019; Tian et al, 2019; Hénaff et al, 2019; Löwe et al, 2019; He et al, 2019; Chen et al, 2020; Tian et al, 2020; Wang & Isola, 2020), text (Rivière et al, 2020; Oord et al, 2018; Kong et al, 2019), and audio (Löwe et al, 2019; Oord et al, 2018), with high empirical downstream performance.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Different contrastive methods differ from each other in terms of the approach to resolving this intractability, the definitions of what different views are, and the exact implementations of the contrastive loss form. Such methods include instance recognition (IR) ( 40 ), contrastive multiview coding (CMC) ( 39 ), momentum contrast (MoCo) ( 42 ), simple contrastive learning of representation (SimCLR) ( 43 ), and local aggregation (LA) ( 41 ). For example, the IR method involves maintaining running averages of embeddings for all inputs (called the “memory bank”) across the training time and replacing and with the corresponding running-average embeddings and .…”
Section: Unsupervised Learning Algorithmsmentioning
confidence: 99%
“…the best performance of NPID over the validation set was with k = 25). We follow the unsupervised as well as self-supervised representation learning literatures, 15 – 18 , 24 where cosine similarity has been used as a metric to describe the distance between two features on a unit sphere space.…”
Section: Methodsmentioning
confidence: 99%