A Random CNN Sees Objects: One Inductive Bias of CNN and Its Applications

Cao, Yun-Hao; Wu, Jianxin

doi:10.1609/aaai.v36i1.19894

Cited by 11 publications

(7 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To clarify this, we carefully analyze the disparities in vanilla and distillation performance for each model: (1) for overall search space, vanilla accuracy only preserves 85% correlations with actual distillation performance. (2) for a particular instance, as shown in Figure 1 (Right), ResNet20 with 3 res-blocks in each stage (i.e., ResNet [3,3,3]) has more parameters and better standalone performance but is weaker than ResNet [7,1,3] in the distillation process. Considering that ResNet [7,1,3] has more layers than ResNet20, we seek to understand the above phenomenon regarding the vanilla-distillation accuracy gap from the perspective of semantic matching [42].…”

Section: Rs$ffxudf\ 'Lv:276fruhmentioning

confidence: 99%

“…(2) for a particular instance, as shown in Figure 1 (Right), ResNet20 with 3 res-blocks in each stage (i.e., ResNet [3,3,3]) has more parameters and better standalone performance but is weaker than ResNet [7,1,3] in the distillation process. Considering that ResNet [7,1,3] has more layers than ResNet20, we seek to understand the above phenomenon regarding the vanilla-distillation accuracy gap from the perspective of semantic matching [42]. ResNet [7,1,3] enjoys a larger effective receptive field and more excellent matched knowledge with teacher, resulting in significant distillation gains.…”

Section: Rs$ffxudf\ 'Lv:276fruhmentioning

confidence: 99%

“…Considering that ResNet [7,1,3] has more layers than ResNet20, we seek to understand the above phenomenon regarding the vanilla-distillation accuracy gap from the perspective of semantic matching [42]. ResNet [7,1,3] enjoys a larger effective receptive field and more excellent matched knowledge with teacher, resulting in significant distillation gains. Encouraged by this understanding, we strive to design a new zero-proxy regarding the semantic matching of teacher-student.…”

Section: Rs$ffxudf\ 'Lv:276fruhmentioning

confidence: 99%

“…To address this issue, Knowledge Distillation (KD) has been proposed as a means of transferring knowledge from a high-capacity teacher model to a low-capacity target student model, providing a more optimal accuracy-efficiency trade-off during runtime [5,8,84]. The [7,1,3] and ResNet [3,3,3] on search space S0. original KD method [26] utilizes the logit outputs of the teacher network as the source of knowledge.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

DisWOT: Student Architecture Search for Distillation WithOut Training

Dong¹,

Li²,

Wei³

2023

Preprint

View full text Add to dashboard Cite

Knowledge distillation (KD) is an effective training strategy to improve the lightweight student models under the guidance of cumbersome teachers. However, the large architecture difference across the teacher-student pairs limits the distillation gains. In contrast to previous adaptive distillation methods to reduce the teacher-student gap, we explore a novel training-free framework to search for the best student architectures for a given teacher. Our work first empirically show that the optimal model under vanilla training cannot be the winner in distillation. Secondly, we find that the similarity of feature semantics and sample relations between random-initialized teacher-student networks have good correlations with final distillation performances. Thus, we efficiently measure similarity matrixs conditioned on the semantic activation maps to select the optimal student via an evolutionary algorithm without any training. In this way, our student architecture search for Distillation WithOut Training (DisWOT) significantly improves the performance of the model in the distillation stage with at least 180× training acceleration. Additionally, we extend similarity metrics in DisWOT as new distillers and KD-based zero-proxies. Our experiments on CIFAR, ImageNet and NAS-Bench-201 demonstrate that our technique achieves state-of-the-art results on different search spaces. Our project and code are available at https://lilujunai.github.io/DisWOT-CVPR2023/.* Corresponding author, † equal contribution, PD conducted main experiments, LL proposed ideas and led the project & writing.

show abstract

Section: Rs$ffxudf\ 'Lv:276fruhmentioning

confidence: 99%

Section: Rs$ffxudf\ 'Lv:276fruhmentioning

confidence: 99%

Section: Rs$ffxudf\ 'Lv:276fruhmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DisWOT: Student Architecture Search for Distillation WithOut Training

Dong¹,

Li²,

Wei³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Numerous techniques have been developed to advance this field by incorporating different inductive biases (Figure 1 (a)) due to the task's complexity. However, regrettably, the object navigation field does not form a unified inductive bias paradigm similar to the CV (Cao & Wu, 2022;d'Ascoli et al, 2021) or NLP (Levine et al, 2022;Kharitonov & Chaabouni, 2021) fields. Inspired by the flaw, through the induction and sublimation of the current mainstream methods, we propose a meta-ability decoupling (MAD) paradigm, hoping to unify and connect various object navigation methods.…”

Section: Introductionmentioning

confidence: 99%

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

Dang¹,

Chen²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together. Based on the MAD paradigm, we design a multiple thinking (MT) model that leverages distinct thinking to abstract various metaabilities. Our method decouples meta-abilities from three aspects: input, encoding, and reward while employing the multiple thinking collaboration (MTC) module to promote mutual cooperation between thinking. MAD introduces a novel qualitative and quantitative interpretability system for object navigation. Through extensive experiments on AI2-Thor and RoboTHOR, we demonstrate that our method outperforms stateof-the-art (SOTA) methods on both typical and zero-shot object navigation tasks.

show abstract