2018
DOI: 10.1109/tpami.2017.2773081
|View full text |Cite
|
Sign up to set email alerts
|

Learning without Forgetting

Abstract: When building a unified vision system or gradually adding new apabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
812
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 2,070 publications
(988 citation statements)
references
References 36 publications
5
812
0
1
Order By: Relevance
“…In addition to aforementioned strategies, a great number of methods, e.g., [13,17], have been proposed to transfer the knowledge from the weights of pre-trained source networks to the target task, for better accuracy. However, incorporating weights from inappropriate networks using inappropriate transfer learning strategies sometimes may hurt the training procedure and may lead to even lower accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to aforementioned strategies, a great number of methods, e.g., [13,17], have been proposed to transfer the knowledge from the weights of pre-trained source networks to the target task, for better accuracy. However, incorporating weights from inappropriate networks using inappropriate transfer learning strategies sometimes may hurt the training procedure and may lead to even lower accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…AR1, firstly proposed in [48], consists of the combination of an Architectural and Regularization approach. In particular, CWR+ is extended by allowingΘ to be tuned across batches subject to a regularization constraint (as per LWF [38], EWC [29] or SI [83]). The authors performed several combination experiments on CORe50 to select a regularization approach; each approach required a new hyperparameter tuning w.r.t.…”
Section: Architect and Regularize (Ar1)mentioning
confidence: 99%
“…Empirical evidence shows that connectionists architectures are prone to catastrophic forgetting, i.e., when learning a new class or task, the overall performance on previously learned classes and tasks may abruptly decrease due to the novel input interfering with or completely overwriting existing representations [13,53]. Because catastrophic forgetting is a phenomenon that affects also deep learning models, the interest in CL models has grown Figure 1: Venn diagram of some of the most popular CL strategies: CWR [41], PNN [72], EWC [29], SI [83], LWF [38], ICARL [68], GEM [45], FearNet [27], GDM [63], ExStream [20], Pure Rehearsal, GR [77], MeRGAN [80] and AR1 [47]. Rehearsal and Generative Replay upper categories can be seen as a subset of replay strategies.…”
Section: Introductionmentioning
confidence: 99%
“…However, in many application scenarios, new data becomes available over time or the distribution underlying the problem changes. When this happens, models are usually retrained from scratch or have to be refined via either fine-tuning [21,45] or incremental learning [40,51]. In any case, a human expert has to assign labels to identify objects and corresponding classes for every unlabeled example.…”
Section: Introductionmentioning
confidence: 99%