2022
DOI: 10.1109/tcds.2021.3086267
|View full text |Cite
|
Sign up to set email alerts
|

A Sensorimotor Perspective on Contrastive Multiview Visual Representation Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 66 publications
0
4
0
Order By: Relevance
“…b) Visual data-augmentations: Most contrastive learning approaches use unrealistic data augmentations like color distortion, crop, rotation, resize, blur [8], Jigsaw [18], selection of color channels [19]. In practice, it has been argued that these augmentations are related to a subset of more realistic augmentations [20]: crop can relate to occlusion, resize to object depth motion, color changes to illumination changes etc. Whether unrealistic augmentations and their realistic counterpart truly result in similar representations remains unclear.…”
Section: Related Workmentioning
confidence: 99%
“…b) Visual data-augmentations: Most contrastive learning approaches use unrealistic data augmentations like color distortion, crop, rotation, resize, blur [8], Jigsaw [18], selection of color channels [19]. In practice, it has been argued that these augmentations are related to a subset of more realistic augmentations [20]: crop can relate to occlusion, resize to object depth motion, color changes to illumination changes etc. Whether unrealistic augmentations and their realistic counterpart truly result in similar representations remains unclear.…”
Section: Related Workmentioning
confidence: 99%
“…In contrastive ones, models usually learn to embed multiple views of an image into similar representations [4]. The generation process used to obtain these views can be related to some form of action [18], such as cropping which can be linked to head movement and eye saccades. However, these methods include the actions neither to build nor to learn the representations.…”
Section: Related Workmentioning
confidence: 99%
“…They propose to use a pretext task during learning, usually making close the representations of inputs considered similar, to improve performance of a predefined task or in the context of unsupervised learning. In computer vision, some of the similar inputs generation processes can be interpreted as the resulting from movements [18]. In reinforcement learning, temporal prediction of consequences of action is often used as a pretext task.…”
Section: Introductionmentioning
confidence: 99%
“…b) Visual data-augmentations: Most contrastive learning approaches use unrealistic data augmentations like color distortion, crop, rotation, resize, blur [7], [17], Jigsaw [18], selection of color channels [19]. In practice, it has been argued that these augmentations are related to a subset of more realistic augmentations [20]: crop can relate to occlusion, resize to object depth motion, color changes to illumination changes etc. Whether unrealistic augmentations and their realistic counterpart truly result in similar representations remains unclear.…”
Section: Related Workmentioning
confidence: 99%