New Insights on Reducing Abrupt Representation Change in Online Continual Learning

Caccia, Lucas; Aljundi, Rahaf; Asadi, Nader; Tuytelaars, Tinne; Pineau, Joëlle; Belilovsky, Eugene

doi:10.48550/arxiv.2104.05025

Cited by 9 publications

(26 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the one hand, the negative bias towards past classes can be ascribed to the optimization of the cross-entropy loss on examples from the current task. As pointed out in [28], when a new task is presented to the net, an asymmetry arises between the contributions of replay data and current examples to the weights updates: indeed, the gradients of new (and poorly fit) examples outweigh (Fig. 2b).…”

Section: (L2) Der(++) Overemphasize the Classes Of The Current Taskmentioning

confidence: 83%

“…As also observed in other recent works [28], [29], [42], this issue can be mitigated by revising the way the cross-entropy loss is applied during training. Given an example from the current task, we avoid computing the softmax activation on all logits and instead restrict it on those modeling the scores of the current task classes.…”

Section: Preventing Penalization Of Past Classesmentioning

confidence: 87%

“…As done in [11], we use LwF.MC (the CiCL adaptation of LwF); • Experience Replay (ER) [33], [34] is a simple rehearsal method that stores previously encountered examples in a memory buffer for later replay. In spite of its simplicity, it still stands as a strong baseline [15], [18]; • ER with Asymmetric Cross-Entropy (ER-ACE) [28] is a recently proposed modification of ER that addresses stream imbalance w.r.t. to the memory buffer by optimizing separate cross-entropy loss terms; • Incremental Classifier and Representation Learning (iCaRL) [11] combines a carefully-designed herding buffer with a nearest mean-of-exemplars classifier; it is often regarded as a strong performer on complex datasets; • Bias Correction (BiC) [41] pairs Experience Replay with a regularization term that resembles the objective of LwF.…”

Section: Baselines and Competing Methodsmentioning

confidence: 99%

“…By contrast, recent CL works focus mainly on Class-IL [12], [28], [29], [30], [31], as it is more general and regarded as more realistic than the other scenarios [12], [32].…”

Section: Continual Learningmentioning

confidence: 99%

“…Several works [28], [29], [41], [42] have recently shed light on the accumulation of bias towards present classes and the negative impact it has on performance. We have found that also DER(++) are prone to such a pitfall: similarly to [41], we can quantitatively characterize it by evaluating how predictions distribute across different classification heads (as training progresses).…”

Section: (L2) Der(++) Overemphasize the Classes Of The Current Taskmentioning

confidence: 99%

See 4 more Smart Citations

Class-Incremental Continual Learning into the eXtended DER-verse

Boschini,

Bonicelli,

Buzzega

et al. 2022

Preprint

View full text Add to dashboard Cite

The staple of human intelligence is the capability of acquiring knowledge in a continuous fashion. In stark contrast, Deep Networks forget catastrophically and, for this reason, the sub-field of Class-Incremental Continual Learning fosters methods that learn a sequence of tasks incrementally, blending sequentially-gained knowledge into a comprehensive prediction. This work aims at assessing and overcoming the pitfalls of our previous proposal Dark Experience Replay (DER), a simple and effective approach that combines rehearsal and Knowledge Distillation. Inspired by the way our minds constantly rewrite past recollections and set expectations for the future, we endow our model with the abilities to i) revise its replay memory to welcome novel information regarding past data ii) pave the way for learning yet unseen classes. We show that the application of these strategies leads to remarkable improvements; indeed, the resulting method -termed eXtended-DER (X-DER) -outperforms the state of the art on both standard benchmarks (such as CIFAR-100 and miniImageNet) and a novel one here introduced. To gain a better understanding, we further provide extensive ablation studies that corroborate and extend the findings of our previous research (e.g. the value of Knowledge Distillation and flatter minima in continual learning setups). We make our results fully reproducible; the codebase is available at https://github.com/aimagelab/mammoth.

show abstract

Section: (L2) Der(++) Overemphasize the Classes Of The Current Taskmentioning

confidence: 83%

Section: Preventing Penalization Of Past Classesmentioning

confidence: 87%

Section: Baselines and Competing Methodsmentioning

confidence: 99%

“…By contrast, recent CL works focus mainly on Class-IL [12], [28], [29], [30], [31], as it is more general and regarded as more realistic than the other scenarios [12], [32].…”

Section: Continual Learningmentioning

confidence: 99%

Section: (L2) Der(++) Overemphasize the Classes Of The Current Taskmentioning

confidence: 99%

See 3 more Smart Citations

Class-Incremental Continual Learning into the eXtended DER-verse

Boschini,

Bonicelli,

Buzzega

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

MixER: Mixup-Based Experience Replay for Online Class-Incremental Learning

Lim,

Zhou,

Kim

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Continual learning in the online class-incremental setting aims to learn new classes continuously from a consistent data stream while retaining the knowledge of old classes to prevent catastrophic forgetting. Traditional replay-based methods store and use old-class data to achieve this. However, they often overlook the representation shift caused by the incoming data streams, which leads to suboptimal classification accuracy. In this study, we propose a solution for mitigating representation shifts by incorporating asymmetric mixup training into the replay method. Our approach is based on the concept that mixup-based training enhances the stability of model predictions and gradient norms between training samples. Our method differs from typical mixup augmentation, which is uniformly applied to all data. Instead, it selectively targets the old data stored in the memory buffer, deliberately excluding the classes from the newly incoming data. This approach enables the model to learn new data while preserving the representation of the old data. Moreover, our experiments demonstrate the effectiveness of the proposed method, which not only enhances the performance of replay-based methods but can also be seamlessly integrated as an additional compatible module into various replay-based techniques. Evaluation on the CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets demonstrates that our approach surpasses existing replay-based methods. It addresses the limitations of conventional replay techniques and offers a potential solution for continual learning scenarios. Our source code is publicly available at https://github.com/laymond1/MixER.

show abstract

Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

Li,

Wang,

Chen

et al. 2024

AAAI

View full text Add to dashboard Cite

Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the training data from previous tasks is unavailable and the model size remains relatively constant during sequential training. To achieve such desiderata, we propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. This is achieved by the synergy between two key components: HSIC-Bottleneck Orthogonalization (HBO) implements non-overwritten parameter updates mediated by Hilbert-Schmidt independence criterion in an orthogonal space and EquiAngular Embedding (EAE) enhances decision boundary adaptation between old and new tasks with predefined basis vectors. Extensive experiments demonstrate that our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.

show abstract

New Insights on Reducing Abrupt Representation Change in Online Continual Learning

Cited by 9 publications

References 21 publications

Class-Incremental Continual Learning into the eXtended DER-verse

Class-Incremental Continual Learning into the eXtended DER-verse

MixER: Mixup-Based Experience Replay for Online Class-Incremental Learning

Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

Contact Info

Product

Resources

About