2016
DOI: 10.1007/978-3-319-46466-4_5
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

Abstract: Abstract. In this paper we study the problem of image representation learning without human annotation. By following the principles of selfsupervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the context-free network (CFN), a siamese-ennead CNN. The CFN takes image tiles as input an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

5
2,009
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 2,257 publications
(2,101 citation statements)
references
References 28 publications
5
2,009
1
Order By: Relevance
“…Interestingly, when finer grid cells are used (e.g., 4 × 4), we do not observe any improvement in recognition performance. Moreover, the first layer filters learned by our method and the ones learned by Noroozi and Favaro [32] seems somewhat similar (see Figure 5) despite the different number of permutations learned. This fact suggests that 3 × 3 grid partition and a well chosen subset of permutations are enough to learn filters which produce state-of-theart results for self-supervised representation learning.…”
Section: Self-supervised Representation Learningmentioning
confidence: 62%
See 4 more Smart Citations
“…Interestingly, when finer grid cells are used (e.g., 4 × 4), we do not observe any improvement in recognition performance. Moreover, the first layer filters learned by our method and the ones learned by Noroozi and Favaro [32] seems somewhat similar (see Figure 5) despite the different number of permutations learned. This fact suggests that 3 × 3 grid partition and a well chosen subset of permutations are enough to learn filters which produce state-of-theart results for self-supervised representation learning.…”
Section: Self-supervised Representation Learningmentioning
confidence: 62%
“…This fact suggests that 3 × 3 grid partition and a well chosen subset of permutations are enough to learn filters which produce state-of-theart results for self-supervised representation learning. However, DeepPermNet is a generic method than the method proposed by Noroozi and Favaro [32], and our method can be used to solve many different computer vision tasks as shown in our experiments.…”
Section: Self-supervised Representation Learningmentioning
confidence: 89%
See 3 more Smart Citations