2022
DOI: 10.1609/aaai.v36i1.19894
|View full text |Cite
|
Sign up to set email alerts
|

A Random CNN Sees Objects: One Inductive Bias of CNN and Its Applications

Abstract: This paper starts by revealing a surprising finding: without any learning, a randomly initialized CNN can localize objects surprisingly well. That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias ("The object is at sight") in this paper. This empirical inductive bias is further analyzed and successfully applied to self-supervised learning (SSL). A CNN is encouraged to learn representations that focus on the foreground object, by transforming every image into various versions with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 27 publications
0
7
0
Order By: Relevance
“…To clarify this, we carefully analyze the disparities in vanilla and distillation performance for each model: (1) for overall search space, vanilla accuracy only preserves 85% correlations with actual distillation performance. (2) for a particular instance, as shown in Figure 1 (Right), ResNet20 with 3 res-blocks in each stage (i.e., ResNet [3,3,3]) has more parameters and better standalone performance but is weaker than ResNet [7,1,3] in the distillation process. Considering that ResNet [7,1,3] has more layers than ResNet20, we seek to understand the above phenomenon regarding the vanilla-distillation accuracy gap from the perspective of semantic matching [42].…”
Section: Rs$ffxudf\ 'Lv:276fruhmentioning
confidence: 99%
See 3 more Smart Citations
“…To clarify this, we carefully analyze the disparities in vanilla and distillation performance for each model: (1) for overall search space, vanilla accuracy only preserves 85% correlations with actual distillation performance. (2) for a particular instance, as shown in Figure 1 (Right), ResNet20 with 3 res-blocks in each stage (i.e., ResNet [3,3,3]) has more parameters and better standalone performance but is weaker than ResNet [7,1,3] in the distillation process. Considering that ResNet [7,1,3] has more layers than ResNet20, we seek to understand the above phenomenon regarding the vanilla-distillation accuracy gap from the perspective of semantic matching [42].…”
Section: Rs$ffxudf\ 'Lv:276fruhmentioning
confidence: 99%
“…(2) for a particular instance, as shown in Figure 1 (Right), ResNet20 with 3 res-blocks in each stage (i.e., ResNet [3,3,3]) has more parameters and better standalone performance but is weaker than ResNet [7,1,3] in the distillation process. Considering that ResNet [7,1,3] has more layers than ResNet20, we seek to understand the above phenomenon regarding the vanilla-distillation accuracy gap from the perspective of semantic matching [42]. ResNet [7,1,3] enjoys a larger effective receptive field and more excellent matched knowledge with teacher, resulting in significant distillation gains.…”
Section: Rs$ffxudf\ 'Lv:276fruhmentioning
confidence: 99%
See 2 more Smart Citations
“…Numerous techniques have been developed to advance this field by incorporating different inductive biases (Figure 1 (a)) due to the task's complexity. However, regrettably, the object navigation field does not form a unified inductive bias paradigm similar to the CV (Cao & Wu, 2022;d'Ascoli et al, 2021) or NLP (Levine et al, 2022;Kharitonov & Chaabouni, 2021) fields. Inspired by the flaw, through the induction and sublimation of the current mainstream methods, we propose a meta-ability decoupling (MAD) paradigm, hoping to unify and connect various object navigation methods.…”
Section: Introductionmentioning
confidence: 99%