Chenglin Yang scite author profile

Optimizing a deep neural network is a fundamental task in computer vision, yet direct training methods often suffer from over-fitting. Teacher-student optimization aims at providing complementary cues from a model trained previously, but these approaches are often considerably slow due to the pipeline of training a few generations in sequence, i.e., time complexity is increased by several times. This paper presents snapshot distillation (SD), the first framework which enables teacher-student optimization in one generation. The idea of SD is very simple: instead of borrowing supervision signals from previous generations, we extract such information from earlier epochs in the same generation, meanwhile make sure that the difference between teacher and student is sufficiently large so as to prevent under-fitting. To achieve this goal, we implement SD in a cyclic learning rate policy, in which the last snapshot of each cycle is used as the teacher for all iterations in the next cycle, and the teacher signal is smoothed to provide richer information. In standard image classification benchmarks such as CIFAR100 and ILSVRC2012, SD achieves consistent accuracy gain without heavy computational overheads. We also verify that models pre-trained with SD transfers well to object detection and semantic segmentation in the PascalVOC dataset.

show abstract

Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students

Yang

Xie

Qiao

et al. 2019

AAAI

137

View full text Add to dashboard Cite

We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision signals in the next stage. While this strategy leads to a higher accuracy, many aspects (e.g., why teacher-student optimization helps) still need further explorations.This paper studies this problem from a perspective of controlling the strictness in training the teacher network. Existing approaches mostly used a hard distribution (e.g., one-hot vectors) in training, leading to a strict teacher which itself has a high accuracy, but we argue that the teacher needs to be more tolerant, although this often implies a lower accuracy. The implementation is very easy, with merely an extra loss term added to the teacher network, facilitating a few secondary classes to emerge and complement to the primary class. Consequently, the teacher provides a milder supervision signal (a less peaked distribution), and makes it possible for the student to learn from inter-class similarity and potentially lower the risk of overfitting. Experiments are performed on standard image classification tasks (CIFAR100 and ILSVRC2012). Although the teacher network behaves less powerful, the students show a persistent ability growth and eventually achieve higher classification accuracies than other competitors. Model ensemble and transfer feature extraction also verify the effectiveness of our approach.

show abstract

PatchAttack: A Black-Box Texture-Based Attack with Reinforcement Learning

Yang

Kortylewski

Xie

et al. 2020

View full text Add to dashboard Cite

Lite Vision Transformer with Enhanced Self-Attention

Yang

Wang

Zhang

et al. 2022

View full text Add to dashboard Cite

Endodontic guided treatment using augmented reality on a head‐mounted display system

Song

Yang

Dianat

et al. 2018

Healthcare Technology Letters

View full text Add to dashboard Cite

Endodontic treatment is performed to treat inflamed or infected root canal system of any involved teeth. It is estimated that 22.3 million endodontic procedures are performed annually in the USA. Preparing a proper access cavity before cleaning/shaping (instrumentation) of the root canal system is among the most important steps to achieve a successful treatment outcome. However, accidents such as perforation, gouging, ledge and canal transportation may occur during the procedure because of an improper or incomplete access cavity design. To reduce or prevent these errors in root canal treatments, this Letter introduces an assistive augmented reality (AR) technology on the head-mounted display (HMD). The proposed system provides audiovisual warning and correction in situ on the optical see-through HMD to assist the dentists to prepare access cavity. Interaction of the clinician with the system is via voice commands allowing the bi-manual operation. Also, dentist is able to review tooth radiographs during the procedure without the need to divert attention away from the patient and look at a separate monitor. Experiments are performed to evaluate the accuracy of the measurements. To the best of the authors' knowledge, this is the first time that an HMD-based AR prototype is introduced for an endodontic procedure.

show abstract

Meticulous Object Segmentation

Yang¹,

Wang²,

Zhang³

et al. 2020

Preprint

View full text Add to dashboard Cite

PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning

Yang

Kortylewski

Xie

et al. 2020

Preprint

View full text Add to dashboard Cite

An Adaptive Prediction Model for the Remaining Life of an Li-Ion Battery Based on the Fusion of the Two-Phase Wiener Process and an Extreme Learning Machine

et al. 2021

View full text Add to dashboard Cite

Lithium-ion batteries (LiBs) are the most important part of electric vehicle (EV) systems. Because there are two different degradation rates during LiB degradation, there are many two-phase models for LiBs. However, most of these methods do not consider the randomness of the changing point in the two-phase model and cannot update the change time in real time. Therefore, this paper proposes a method based on the combination of the two-phase Wiener model and an extreme learning machine (ELM). The two-phase Wiener model is used to derive the mathematical expression of the remaining useful life (RUL), and the ELM is implemented to adaptively detect the changing point. Based on the Poisson distribution, the distribution of the changing time is derived as a gamma distribution. To evaluate the theoretical results and practicality of the proposed method, we perform both numerical and practical simulations. The results of the simulations show that due to the precise and adaptive detection of changing points, the proposed method produces a more accurate RUL prediction than existing methods. The error of our method for detecting the changing point is about 4% and the mean prediction error of RUL in the second phase is improved from 4.39 cycles to 1.61 cycles.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chenglin Yang

Snapshot Distillation: Teacher-Student Optimization in One Generation

Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students

PatchAttack: A Black-Box Texture-Based Attack with Reinforcement Learning

Lite Vision Transformer with Enhanced Self-Attention

Endodontic guided treatment using augmented reality on a head‐mounted display system

Meticulous Object Segmentation

PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning

An Adaptive Prediction Model for the Remaining Life of an Li-Ion Battery Based on the Fusion of the Two-Phase Wiener Process and an Extreme Learning Machine

Contact Info

Product

Resources

About