Mengqi Xue scite author profile

Mengqi Xue

5Publications

42Citation Statements Received

35Citation Statements Given

How they've been cited

How they cite others

Affiliations

Zhejiang University, City College

Publications

Order By: Most citations

Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation

Shen

Xue

Wang

et al. 2019

View full text Add to dashboard Cite

A massive number of well-trained deep networks have been released by developers online. These networks may focus on different tasks and in many cases are optimized for different datasets. In this paper, we study how to exploit such heterogeneous pre-trained networks, known as teachers, so as to train a customized student network that tackles a set of selective tasks defined by the user. We assume no human annotations are available, and each teacher may be either single-or multi-task. To this end, we introduce a dualstep strategy that first extracts the task-specific knowledge from the heterogeneous teachers sharing the same sub-task, and then amalgamates the extracted knowledge to build the student network. To facilitate the training, we employ a selective learning scheme where, for each unlabelled sample, the student learns adaptively from only the teacher with the least prediction ambiguity. We evaluate the proposed approach on several datasets and experimental results demonstrate that the student, learned by such adaptive knowledge amalgamation, achieves performances even better than those of the teachers.

show abstract

Tree-like Decision Distillation

Song

Zhang

Wang

et al. 2021

View full text Add to dashboard Cite

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Xue

Song

Wang

et al. 2021

View full text Add to dashboard Cite

Knowledge distillation (KD) has recently emerged as an efficacious scheme for learning compact deep neural networks (DNNs). Despite the promising results achieved, the rationale that interprets the behavior of KD has yet remained largely understudied. In this paper, we introduce a novel task-oriented attention model, termed as KDExplainer, to shed light on the working mechanism underlying the vanilla KD. At the heart of KDExplainer is a Hierarchical Mixture of Experts (HME), in which a multi-class classification is reformulated as a multi-task binary one. Through distilling knowledge from a free-form pre-trained DNN to KDExplainer, we observe that KD implicitly modulates the knowledge conflicts between different subtasks, and in reality has much more to offer than label smoothing. Based on such findings, we further introduce a portable tool, dubbed as virtual attention module (VAM), that can be seamlessly integrated with various DNNs to enhance their performance under KD. Experimental results demonstrate that with a negligible additional cost, student models equipped with VAM consistently outperform their non-VAM counterparts across different benchmarks. Furthermore, when combined with other KD methods, VAM remains competent in promoting results, even though it is only motivated by vanilla KD. The code is available at https:// github.com/zju-vipa/KDExplainer.

show abstract

Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training

Zhang

Duan

Xue

et al. 2022

View full text Add to dashboard Cite

Meta-attention for ViT-backed Continual Learning

Xue

Zhang

Song

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mengqi Xue

Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation

Tree-like Decision Distillation

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training

Meta-attention for ViT-backed Continual Learning

Contact Info

Product

Resources

About