Search to Distill: Pearls Are Everywhere but Not the Eyes

Liu, Yu; Jia, Xuhui; Tan, Mingxing; Vemulapalli, Raviteja; Zhu, Yukun; Green, Bradley; Wang, Xiaogang

doi:10.1109/cvpr42600.2020.00756

Cited by 54 publications

(33 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, a new reward function is suggested, which can effectively improve the quality of the generated networks and reduce the difficulty for manual hyperparameter tuning. Liu et al [56] present a novel knowledge distillation [57] approach to NAS, called architecture-aware knowledge distillation (AKD), which finds student models (compressed teacher models) that are best for distilling the given teacher model. The authors employ a RL-based NAS method with a KD-guided reward function to search for the best student model based on a given teacher model.…”

Section: Nas Based On Reinforcement Learningmentioning

confidence: 99%

From federated learning to federated neural architecture search: a survey

Zhu

Zhang

Jin

2021

Complex Intell. Syst.

113

View full text Add to dashboard Cite

Federated learning is a recently proposed distributed machine learning paradigm for privacy preservation, which has found a wide range of applications where data privacy is of primary concern. Meanwhile, neural architecture search has become very popular in deep learning for automatically tuning the architecture and hyperparameters of deep neural networks. While both federated learning and neural architecture search are faced with many open challenges, searching for optimized neural architectures in the federated learning framework is particularly demanding. This survey paper starts with a brief introduction to federated learning, including both horizontal, vertical, and hybrid federated learning. Then neural architecture search approaches based on reinforcement learning, evolutionary algorithms and gradient-based are presented. This is followed by a description of federated neural architecture search that has recently been proposed, which is categorized into online and offline implementations, and single- and multi-objective search approaches. Finally, remaining open research questions are outlined and promising research topics are suggested.

show abstract

Section: Nas Based On Reinforcement Learningmentioning

confidence: 99%

From federated learning to federated neural architecture search: a survey

Zhu

Zhang

Jin

2021

Complex Intell. Syst.

113

View full text Add to dashboard Cite

show abstract

“…We first explored the standard distillation approach in which we take the bestperforming model as the teacher. However, it is known that a wider gap in terms of architecture might mean a less effective transfer [19,33]. Thus, we also explore a sequential distillation approach.…”

Section: Born-again Distillationmentioning

confidence: 99%

Towards Practical Lipreading with Distilled and Efficient Models

Martínez

Petridis

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization. However, there is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios. In this work, we propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5 % and 46.6 %, respectively using self-distillation. Secondly, we propose a series of architectural changes, including a novel Depthwise Separable Temporal Convolutional Network (DS-TCN) head, that slashes the computational cost to a fraction of the (already quite efficient) original model. Thirdly, we show that knowledge distillation is a very effective tool for recovering performance of the lightweight models. This results in a range of models with different accuracy-efficiency trade-offs. However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8.2× and 3.9× in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

show abstract

“…In contrast, our task necessitates a deployment scenario with two models: one for processing the query images and another for processing the gallery. Recently, [18,15] propose to use a large teacher model to guide the architecture search process for a smaller student which is essentially knowledge distillation in architecture space. However, our experiments show that knowledge distillation cannot guarantee compatibility and thus these methods may not succeed in optimizing the architecture in that aspect.…”

Section: Related Workmentioning

confidence: 99%

Compatibility-aware Heterogeneous Visual Search

Duggal¹,

Zhang²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

We tackle the problem of visual search under resource constraints. Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images. Such systems inherently face a hard accuracy-efficiency trade-off: the embedding model needs to be large enough to ensure high accuracy, yet small enough to enable query-embedding computation on resource-constrained platforms. This trade-off could be mitigated if gallery embeddings are generated from a large model and query embeddings are extracted using a compact model. The key to building such a system is to ensure representation compatibility between the query and gallery models. In this paper, we address two forms of compatibility: One enforced by modifying the parameters of each model that computes the embeddings. The other by modifying the architectures that compute the embeddings, leading to compatibility-aware neural architecture search (CMP-NAS). We test CMP-NAS on challenging retrieval tasks for fashion images (DeepFashion2), and face images (IJB-C). Compared to ordinary (homogeneous) visual search using the largest embedding model (paragon), CMP-NAS achieves 80-fold and 23-fold cost reduction while maintaining accuracy within 0.3% and 1.6% of the paragon on DeepFashion2 and IJB-C respectively.

show abstract

Search to Distill: Pearls Are Everywhere but Not the Eyes

Cited by 54 publications

References 20 publications

From federated learning to federated neural architecture search: a survey

From federated learning to federated neural architecture search: a survey

Towards Practical Lipreading with Distilled and Efficient Models

Compatibility-aware Heterogeneous Visual Search

Contact Info

Product

Resources

About