Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

Karlekar, Jayashree; Feng, Jiashi; Wong, Zi Sian; Pranata, Sugiri

doi:10.48550/arxiv.1906.00619

Cited by 3 publications

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, employing a loop training strategy to train multiple networks simultaneously and weakening the relationship between instruction and learning can further optimize the teacher-student strategy [18]. To address the problem of low resolution, a teacher-student strategy using the same architecture was proposed in [19] which can be applied to train images with different resolutions. Also, some improvements have been made to the traditional distillation methods for the semantic segmentation task, such as using an association adaptation module to enable the student model to obtain and extract more information when learning about the teacher's knowledge [20].…”

Section: Introductionmentioning

confidence: 99%

A Lightweight Identification Method for Complex Power Industry Tasks Based on Knowledge Distillation and Network Pruning

Wang,

Zhou,

Jiang

et al. 2023

Processes

View full text Add to dashboard Cite

Lightweight service identification models are very important for resource-constrained distribution grid systems. To address the increasingly larger deep learning models, we provide a method for the lightweight identification of complex power services based on knowledge distillation and network pruning. Specifically, a pruning method based on Taylor expansion is first used to rank the importance of the parameters of the small-scale network and delete some of the parameters, compressing the model parameters and reducing the amount of operation and complexity. Then, knowledge distillation is used to migrate the knowledge from the large-scale network ResNet50 to the small-scale network so that the small-scale network can fit the soft-label information output from the large-scale neural network through the loss function to complete the knowledge migration of the large-scale neural network. Experimental results show that this method can compress the model size of the small network and improve the recognition accuracy. Compared with the original small network, the model accuracy is improved by 2.24 percentage points to 97.24%. The number of model parameters is compressed by 81.9% and the number of floating-point operations is compressed by 92.1%, making it more suitable for deployment in resource-constrained devices.

show abstract

Section: Introductionmentioning

confidence: 99%

A Lightweight Identification Method for Complex Power Industry Tasks Based on Knowledge Distillation and Network Pruning

Wang,

Zhou,

Jiang

et al. 2023

Processes

View full text Add to dashboard Cite

show abstract

“…For face recognition knowledge distillation, there have been several attempts (Wang, Lan, and Zhang 2017;Luo et al 2016;Karlekar, Feng, and Pranata 2019;Ge et al 2018;Feng et al 2019;Peng et al 2019;Wang et al 2019aWang et al , 2020a in literatures to distil large CNNs, so as to make their deployments easier. Hinton et al (Hinton, Vinyals, and Dean.…”

Section: Introductionmentioning

confidence: 99%

“…Luo et al (Luo et al 2016) propose a neuron selection method by leveraging the essential characteristics (domain knowledge) of the learned face representation. Karlekar et al (Karlekar, Feng, and Pranata 2019) simultaneously exploit one-hot labels and feature vectors for the knowledge transfer between different face resolutions. Ge et al (Ge et al 2018) develop a selective knowledge distillation, which selectively distils the most informative facial features by solving a sparse graph optimization problem.…”

Section: Introductionmentioning

confidence: 99%

Teacher Guided Neural Architecture Search for Face Recognition

Wang

2021

AAAI

View full text Add to dashboard Cite

Knowledge distillation is an effective tool to compress large pre-trained convolutional neural networks (CNNs) or their ensembles into models applicable to mobile and embedded devices. However, with expected flops or latency, existing methods are hand-crafted heuristics. They propose to pre-define the target student network for knowledge distillation, which may be sub-optimal because it requires much effort to explore a powerful student from the large design space. In this paper, we develop a novel teacher guided neural architecture search method to directly search for a student network with flexible channel and layer sizes. Specifically, we define the search space as the number of the channels/layers, which is sampled based on the probability distribution and is learned by minimizing the search objective of the student network. The maximum probability for the size in each distribution serves as the final searched width and depth of the target student network. Extensive experiments on a variety of face recognition benchmarks have demonstrated the superiority of our method over the state-of-the-art alternatives.

show abstract

Light Deep Face Recognition based on Knowledge Distillation and Adversarial Training

Liu

2022

2022 International Conference on Mechanical, Automation and Electrical Engineering (CMAEE)

View full text Add to dashboard Cite

Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

Cited by 3 publications

References 20 publications

A Lightweight Identification Method for Complex Power Industry Tasks Based on Knowledge Distillation and Network Pruning

A Lightweight Identification Method for Complex Power Industry Tasks Based on Knowledge Distillation and Network Pruning

Teacher Guided Neural Architecture Search for Face Recognition

Light Deep Face Recognition based on Knowledge Distillation and Adversarial Training

Contact Info

Product

Resources

About