2021 IEEE International Conference on Data Mining (ICDM) 2021
DOI: 10.1109/icdm51629.2021.00069
|View full text |Cite
|
Sign up to set email alerts
|

Attention-based Feature Interaction for Efficient Online Knowledge Distillation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 32 publications
1
10
0
Order By: Relevance
“…ResNet-32 is a typical lightweight baseline model, which is widely selected for many previous advanced methods, we further make comparison experiments on ResNet-32. Results shown in Tab.VII show that EKD-FWSNet has little inferiority compared to AFID [65] and PCL [66].…”
Section: ) Classification On Lightweight Baseline Modelsmentioning
confidence: 92%
See 2 more Smart Citations
“…ResNet-32 is a typical lightweight baseline model, which is widely selected for many previous advanced methods, we further make comparison experiments on ResNet-32. Results shown in Tab.VII show that EKD-FWSNet has little inferiority compared to AFID [65] and PCL [66].…”
Section: ) Classification On Lightweight Baseline Modelsmentioning
confidence: 92%
“…[20]- [24] all design student-classmate ensemble training framework to obtain knowledge of ensemble teacher, which can guide both student and classmate efficiently in an end-to-end manner. AFID [65] directly employs one more complete sub-net to construct a two-branch ensemble training network. Besides distilling knowledge from ensemble teacher, it further proposes a feature interaction module to employ mutual learning between attentive feature maps of two sub-nets.…”
Section: B Knowledge Distillation Guided Training Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…The intermediate feature representations from each teacher block are followed by multiple student blocks, respectively, where the teacher and the student are simultaneously trained by minimizing the differences in the feature representations and logits between the teacher and the student. To solve the regression problem, -FitNets [17] L 2 Noisy Student [8] --Distilling Task-Specific Knowledge from BERT into Simple Neural Networks [14] L 2 A Gift from Knowledge Distillation [18] -L 2 Ranking Distillation [21] --L D M Deep Mutual Learning [22] -Born-Again Neural Networks [23] -LD Paraphrasing Complex Network [31] -LD Knowledge Transfer via Distillation [32] -LD Relational Knowledge Distillation [33] LD DarkRank [34] LD BERT Learns to Teach [35] -Learning Student-Friendly Teacher Networks for Knowledge Distillation [36] -Knowledge Distillation Meets Self-Supervision [37] Cosine Improved Knowledge Distillation via Teacher Assistant [38] -Uninformed Students [39] --L 2 Mean teachers [40] L 2 Adaptive Multi-Teacher [41] Cosine Learning from Multiple Teacher [42] L D M Reinforced Multi-Teacher Selection for Knowledge Distillation [43] --Knowledge Adaptation: Teaching to Adapt [44] -Online Knowledge Distillation with Diverse Peers [45] L 2 Knowledge Distillation by on-the-Fly [46] -Online Knowledge Distillation for Efficient Pose Estimation [47] --Feature Fusion for Online Mutual Knowledge Distillation [48] -Attention-based Feature Interaction for Efficient [49] -Peer Collaborative Learning for Online Knowledge distillation [50] -Large-Scale Domain Adaptation via Teacher-Student Learning [51] -Revisiting Knowledge Distillation via Label Smoothing Regularization [52] -Feature-Map-Level Online Adversarial Knowledge Distillation [53] -Multi-View Contrastive Learning [54] -Online Subclass Knowledge Distillation …”
Section: Single Teachermentioning
confidence: 99%
“…Online learning [22], [38], [45], [46] [47], [48], [49], [50] To be trained To be trained Static Self-learning [52], [53], [64] [54], [55], [56] To be trained To be trained Dynamic any architecture. To better boost the knowledge distillation process, Su et al [49] additionally introduce an attention mechanism to capture important and high-level knowledge, so that teachers and students can be dynamically and effectively trained with the help of the valuable knowledge.…”
Section: Role Statusmentioning
confidence: 99%