Feedback Control for Online Training of Neural Networks

Zhao, Zilong; Cerf, Sophie; Robu, Bogdan; Marchand, Nicolas

doi:10.1109/ccta.2019.8920662

Cited by 3 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As Co-Teaching trains two models, we use two 56-layers ResNet. To speed up model convergence for RAD Slim, RAD Slim Limited, and Forward, we implement the E (Exponential)/PD (Proportional-Derivative)-Control [55] and Event-Based Control Learning rate [56] as learning rate schedule based on stochastic gradient descent (SGD) optimizer. Co-Teaching has its own learning rate scheduler.…”

Section: B Experimental Setupmentioning

confidence: 99%

Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Zhao,

Birke,

Han

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer online data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, and (iii) recognising 100 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98.95% for IoT device attacks (i.e., +7%), up to 85.03% for cloud task failures (i.e., +14%) under 40% label noise, and for its extension, it can reach up to 77.51% for face recognition (i.e., +39%) under 30% label noise. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

show abstract

Section: B Experimental Setupmentioning

confidence: 99%

Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Zhao,

Birke,

Han

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the sake of simplicity the loss values are normalized with respect to the initial epoch loss value L(0). K P and K D are the proportional and derivative gain detailed in [11].…”

Section: A Event-based Learning Ratementioning

confidence: 99%

“…Note that the stability of CNN is ensured by E/PD, whose stability analysis is provided in [11]. Proposed event-based control does not introduce any instability because if e 1 = 0, which means the loss is decreasing, model is converging, and if e 1 = 1, the learning rate strategy returns to E/PD.…”

Section: A Event-based Learning Ratementioning

confidence: 99%

“…In [11], an E/PD control of the learning rate is proposed consisting of an increasing phase followed by a PD phase. However, if an increase of the performance can be achieved on both the loss and the accuracy, the learning rate is progressively decreased by the E/PD control in the PD phase, even though a larger value of learning rate would be more efficient in term of performance.…”

Section: Event-based Control Lawsmentioning

confidence: 99%

“…Up to our knowledge, E (Exponential)/PD (Proportional Derivative) control [11] is the first adaptive learning rate algorithm which uses control theory to dynamically adapt the learning rate during the learning process. It uses only current gradient as in SGD, but its learning rate λ is dynamically calculated based on the loss value.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Event-Based Control for Online Training of Neural Networks

Zhao

Cerf

Robu

et al. 2020

IEEE Control Syst. Lett.

Self Cite

View full text Add to dashboard Cite

Convolutional Neural Network (CNN) has become the most used method for image classification tasks. During its training the learning rate and the gradient are two key factors to tune for influencing the convergence speed of the model. Usual learning rate strategies are time-based i.e. monotonous decay over time. Recent state-of-the-art techniques focus on adaptive gradient algorithms i.e. Adam and its versions. In this paper we consider an online learning scenario and we propose two Event-Based control loops to adjust the learning rate of a classical algorithm E (Exponential)/PD (Proportional Derivative)-Control. The first Event-Based control loop will be implemented to prevent sudden drop of the learning rate when the model is approaching the optimum. The second Event-Based control loop will decide, based on the learning speed, when to switch to the next data batch. Experimental evaluation is provided using two state-of-the-art machine learning image datasets (CIFAR-10 and CIFAR-100). Results show the Event-Based E/PD is better than the original algorithm (higher final accuracy, lower final loss value), and the Double-Event-Based E/PD can accelerate the training process, save up to 67% training time compared to state-of-the-art algorithms and even result in better performance.

show abstract