Learning without Forgetting

Li, Zhizhong; Hoiem, Derek

doi:10.1109/tpami.2017.2773081

Cited by 2,070 publications

(988 citation statements)

References 36 publications

Supporting

Mentioning

812

Contrasting

Unclassified

Order By: Relevance

“…In addition to aforementioned strategies, a great number of methods, e.g., [13,17], have been proposed to transfer the knowledge from the weights of pre-trained source networks to the target task, for better accuracy. However, incorporating weights from inappropriate networks using inappropriate transfer learning strategies sometimes may hurt the training procedure and may lead to even lower accuracy.…”

Section: Introductionmentioning

confidence: 99%

Towards Making Deep Transfer Learning Never Hurt

Wan

Xiong

et al. 2019

2019 IEEE International Conference on Data Mining (ICDM)

View full text Add to dashboard Cite

Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives).Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L 2 -SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%-7% higher accuracy in all experiments.

show abstract

Section: Introductionmentioning

confidence: 99%

Towards Making Deep Transfer Learning Never Hurt

Wan

Xiong

et al. 2019

2019 IEEE International Conference on Data Mining (ICDM)

View full text Add to dashboard Cite

show abstract

“…AR1, firstly proposed in [48], consists of the combination of an Architectural and Regularization approach. In particular, CWR+ is extended by allowingΘ to be tuned across batches subject to a regularization constraint (as per LWF [38], EWC [29] or SI [83]). The authors performed several combination experiments on CORe50 to select a regularization approach; each approach required a new hyperparameter tuning w.r.t.…”

Section: Architect and Regularize (Ar1)mentioning

confidence: 99%

“…Empirical evidence shows that connectionists architectures are prone to catastrophic forgetting, i.e., when learning a new class or task, the overall performance on previously learned classes and tasks may abruptly decrease due to the novel input interfering with or completely overwriting existing representations [13,53]. Because catastrophic forgetting is a phenomenon that affects also deep learning models, the interest in CL models has grown Figure 1: Venn diagram of some of the most popular CL strategies: CWR [41], PNN [72], EWC [29], SI [83], LWF [38], ICARL [68], GEM [45], FearNet [27], GDM [63], ExStream [20], Pure Rehearsal, GR [77], MeRGAN [80] and AR1 [47]. Rehearsal and Generative Replay upper categories can be seen as a subset of replay strategies.…”

Section: Introductionmentioning

confidence: 99%

Online Continual Learning on Sequences

Parisi¹,

Lomonaco

2020

Recent Trends in Learning From Data

View full text Add to dashboard Cite

Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Learning continually in a single data pass is crucial for agents and robots operating in changing environments and required to acquire, fine-tune, and transfer increasingly complex representations from non-i.i.d. input distributions. Machine learning models that address OCL must alleviate catastrophic forgetting in which hidden representations are disrupted or completely overwritten when learning from streams of novel input. In this chapter, we summarize and discuss recent deep learning models that address OCL on sequential input through the use (and combination) of synaptic regularization, structural plasticity, and experience replay. Different implementations of replay have been proposed that alleviate catastrophic forgetting in connectionists architectures via the re-occurrence of (latent representations of) input sequences and that functionally resemble mechanisms of hippocampal replay in the mammalian brain. Empirical evidence shows that architectures endowed with experience replay typically outperform architectures without in (online) incremental learning tasks.

show abstract

“…However, in many application scenarios, new data becomes available over time or the distribution underlying the problem changes. When this happens, models are usually retrained from scratch or have to be refined via either fine-tuning [21,45] or incremental learning [40,51]. In any case, a human expert has to assign labels to identify objects and corresponding classes for every unlabeled example.…”

Section: Introductionmentioning

confidence: 99%

Active and Incremental Learning with Weak Supervision

2020

View full text Add to dashboard Cite

Note: This paper is an extended version of our previous work [4], from which certain parts of sections 2 to 5 (except novel YOLO-specific methods) were taken verbatim. Section 8 contains some verbatim parts from our previous work [33].Abstract Large amounts of labeled training data are one of the main contributors to the great success that deep models have achieved in the past. Label acquisition for tasks other than benchmarks can pose a challenge due to requirements of both funding and expertise. By selecting unlabeled examples that are promising in terms of model improvement and only asking for respective labels, active learning can increase the efficiency of the labeling process in terms of time and cost.In this work, we describe combinations of an incremental learning scheme and methods of active learning. These allow for continuous exploration of newly observed unlabeled data. We describe selection criteria based on model uncertainty as well as expected model output change (E-MOC). An object detection task is evaluated in a continuous exploration context on the PASCAL VOC dataset. We also validate a weakly supervised system based on active and incremental learning in a real-world biodiversity application where images from camera traps are analyzed. Labeling only 32 images by accepting or rejecting proposals gen-

show abstract

Learning without Forgetting

Cited by 2,070 publications

References 36 publications

Towards Making Deep Transfer Learning Never Hurt

Towards Making Deep Transfer Learning Never Hurt

Online Continual Learning on Sequences

Active and Incremental Learning with Weak Supervision

Contact Info

Product

Resources

About