Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence 2021
DOI: 10.24963/ijcai.2021/444
|View full text |Cite
|
Sign up to set email alerts
|

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Abstract: Knowledge distillation (KD) has recently emerged as an efficacious scheme for learning compact deep neural networks (DNNs). Despite the promising results achieved, the rationale that interprets the behavior of KD has yet remained largely understudied. In this paper, we introduce a novel task-oriented attention model, termed as KDExplainer, to shed light on the working mechanism underlying the vanilla KD. At the heart of KDExplainer is a Hierarchical Mixture of Experts (HME), in which a multi-class classificati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…As a representative work of spatial methods, GCN [17] further simplifies graph convolution in the spectral domain by using first-order approximation, which enables graph convolution operations to be carried out in the spatial domain and greatly improves the computational efficiency of graph convolution models. Moreover, to speed up the training of graph neural networks, GNNs Graph-based Knowledge Distillation DKD methods Output layer DKWISL [18] KTG [19] DGCN [20] SPG [21] GCLN [22] Middle layer IEP [23] HKD [24] MHGD [25] IRG [26] DOD [27] HKDIFM [28] KDExplainer [29] TDD [30] DualDE [31] Constructed graph CAG [32] GKD [33] MorsE [34] BAF [35] LAD [36] GD [37] GCMT [38] GraSSNet [39] LSN [40] IntRA-KD [41] RKD [42] CC [43] SPKD [44] KCAN [45] GKD methods…”
Section: Graph Neural Networkmentioning
confidence: 99%
See 3 more Smart Citations
“…As a representative work of spatial methods, GCN [17] further simplifies graph convolution in the spectral domain by using first-order approximation, which enables graph convolution operations to be carried out in the spatial domain and greatly improves the computational efficiency of graph convolution models. Moreover, to speed up the training of graph neural networks, GNNs Graph-based Knowledge Distillation DKD methods Output layer DKWISL [18] KTG [19] DGCN [20] SPG [21] GCLN [22] Middle layer IEP [23] HKD [24] MHGD [25] IRG [26] DOD [27] HKDIFM [28] KDExplainer [29] TDD [30] DualDE [31] Constructed graph CAG [32] GKD [33] MorsE [34] BAF [35] LAD [36] GD [37] GCMT [38] GraSSNet [39] LSN [40] IntRA-KD [41] RKD [42] CC [43] SPKD [44] KCAN [45] GKD methods…”
Section: Graph Neural Networkmentioning
confidence: 99%
“…Additionally, some related work has been proposed. Pan et al [21] argue that the previous video description model has not clearly modeled the interaction between objects and propose a novel spatiotemporal graph network that [23] KL, L1 Multi-task learning Transfer learning, image classification HKD [24] InfoCE Knowledge distillation Image classification, knowledge transfer CAG [32] KL Graph inference Visual dialogue DKWISL [18] KL Natural language processing Relation extraction KTG [19] KL Collaborative learning Image recognition MHGD [25] KL Multi-task learning Image recognition IRG [26] Hit Knowledge distillation Image recognition DGCN [20] KL Collaborative filtering Item recommendations GKD [33] Frobenius Model compression Image classification SPG [21] KL Natural language processing Video captioning MorsE [34] L2 Meta-knowledge transfer Link prediction, question answering system GCLN [22] L2 Image semantic segmentation Vision robot self-positioning DOD [27] KL Object detection Object Detectors BAF [35] EMD Model compression Video classification LAD [36] BELU Natural language processing Machine translation GD [37] Cosine Multimodal video Motion detection, action classification GCMT [38] CE Unsupervised domain adaptation Person re-identification GraSSNet [39] MSE Knowledge transfer Saliency prediction LSN [40] KL, MSE Model compression Node classification IntRA-KD [41] MSE Model compression Road marking segmentation RKD [42] Euclidean,Huber Knowledge distillation Image classification, few-Shot Learning CC [43] KL, MSE Knowledge distillation Image classification, person re-identification SPKD [44] Frobenius Knowledge distillation Image classification, transfer learning HKDIFM [28] KL Knowledge distillation Image classification KDExplainer [29] CE, KL Interpretability Image classification TDD [30] CE, KL Interpretability Image classification DualDE [31] JSD Knowledge distillation Node classification, link prediction KCAN [45] BPR Knowledge graph Top-K Recommendation, TR Prediction explicitly uses sp...…”
Section: Output Layer Knowledgementioning
confidence: 99%
See 2 more Smart Citations
“…Self-distillation is a promising and much more efficient training technique, aims at transferring the knowledge hidden in itself without an additional pre-trained teacher model, and our HIRE can be easily applied to self-distillation technology. Moreover, in addition to being applied to image classification[55], relation extraction[56], and product recommendation[57], our HIRE can also be extended to GNN-based EEG applications[58,59,60], while is not applicable for graph theoretical features of EEG applications[61,62], which will be explored in the future.…”
mentioning
confidence: 99%