SPACE: Structured Compression and Sharing of Representational Space for Continual Learning

Saha, Gobinda; Garg, Isha; Ankit, Aayush; Roy, Kaushik

doi:10.1109/access.2021.3126027

Cited by 18 publications

(47 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although the starting point is different, our gradient modification process is closely related to gradient constraint methods like OWM (Zeng et al, 2019) and GPM (Saha et al, 2021). OWM uses similar iteratively updated projectors derived from recursively least square(RLS), which regards each layer as an independent linear classifier and uses the output of the previous layer to build the projection matrix.…”

Section: Related Workmentioning

confidence: 99%

“…This brings an additional advantage that our method can reuse the hyperparameters of original single-task models. GPM (Saha et al, 2021) projects the gradient of each layer into a lower-dimensional residual space of previous tasks, while the parameter space of RGO is consistent for different tasks. RGO will maintain the network's fitting ability as the number of tasks increases.…”

Section: Related Workmentioning

confidence: 99%

“…Baselines: In this work, we perform experiments on the benchmarks above with the following fixed capacity methods and an expansion-based method for comparison: (1) SGD which uses stochastic gradient descent optimizing procedure to finetune the model, (2) EWC (Kirkpatrick et al, 2017) which is one of the pioneering regularization methods using fisher information diagonals as important weights, (3) A-GEM (Chaudhry et al, 2019a) which uses loss gradients of stored previous data in an in-equality constrained optimization, (4) LOS (Chaudhry et al, 2020) which constraints gradients in a low-rank orthogonal subspace, (5) ER-ring (Chaudhry et al, 2019b) which utilizes a tiny ring memory to alleviate forgetting, (6) GPM (Saha et al, 2021) which trains new tasks in the residual gradient subspace, (7) APD (Yoon et al, 2019) which is a strong expansion-based method decomposing the parameters of different tasks with a common basis, and (8) STL which trains a model for each single task. For the compared methods, we follow the original implementations to perform some necessary processing at the end of every task.…”

Section: Experiments Setupmentioning

confidence: 99%

“…Memory-based approaches use extra memory to store some samples (Lopez-Paz & Ranzato, 2017;Chaudhry et al, 2020), gradients (Chaudhry et al, 2019a;Saha et al, 2021), or their generative models (Shin et al, 2017;Shen et al, 2020) to modify future training process. The memory for replay leads to a linear-increased space complexity with respect to the number of tasks.…”

Section: Introductionmentioning

confidence: 99%

“…Single-Task Learning (STL) can also be regarded as an expansion-based method which trains a network for each task separately. Regularization-based approaches encourage important parameters to lie in a close vicinity of previous solutions by introducing quadratic penalty term to the loss function (Kirkpatrick et al, 2017;Zenke et al, 2017;Yin et al, 2020) or constraining the direction of parameter update (Farajtabar et al, 2019;Chaudhry et al, 2019a;Saha et al, 2021). Our method is also regularization-based which combines the advantages of loss penalty and gradient constraint.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Continual Learning with Recursive Gradient Optimization

Líu¹,

Liu²

2022

Preprint

View full text Add to dashboard Cite

Learning multiple tasks sequentially without forgetting previous knowledge, called Continual Learning (CL), remains a long-standing challenge for neural networks. Most existing methods rely on additional network capacity or data replay. In contrast, we introduce a novel approach which we refer to as Recursive Gradient Optimization (RGO). RGO is composed of an iteratively updated optimizer that modifies the gradient to minimize forgetting without data replay and a virtual Feature Encoding Layer (FEL) that represents different network structures with only task descriptors. Experiments demonstrate that RGO has significantly better performance on popular continual classification benchmarks when compared to the baselines and achieves new state-of-the-art performance on 20-split-CIFAR100 (82.22%) and 20-split-miniImageNet (72.63%). With higher average accuracy than Single-Task Learning (STL), this method is flexible and reliable to provide continual learning capabilities for learning models that rely on gradient descent.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Experiments Setupmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Continual Learning with Recursive Gradient Optimization

Líu¹,

Liu²

2022

Preprint

View full text Add to dashboard Cite

show abstract

Theoretical Understanding of the Information Flow on Continual Learning Performance

Andle

Sekeh

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks, improving network efficiency and adaptability to different tasks. Additionally, CL serves as an ideal setting for studying network behavior and Forward Knowledge Transfer (FKT) between tasks. Pruning methods for CL train subnetworks to handle the sequential tasks which allows us to take a structured approach to investigating FKT. Sharing prior subnetworks' weights leverages past knowledge for the current task through FKT. Understanding which weights to share is important as sharing all weights can yield sub-optimal accuracy. This paper investigates how different sharing decisions affect the FKT between tasks. Through this lens we demonstrate how task complexity and similarity influence the optimal weight sharing decisions, giving insights into the relationships between tasks and helping inform decision making in similar CL methods. We implement three sequential datasets designed to emphasize variation in task complexity and similarity, reporting results for both ResNet-18 and VGG-16. By sharing in accordance with the decisions supported by our findings, we show that we can improve task accuracy compared to other sharing decisions.

show abstract

Class Incremental Learning with Important and Diverse Memory

Li,

Yan,

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

SPACE: Structured Compression and Sharing of Representational Space for Continual Learning

Cited by 18 publications

References 26 publications

Continual Learning with Recursive Gradient Optimization

Continual Learning with Recursive Gradient Optimization

Theoretical Understanding of the Information Flow on Continual Learning Performance

Class Incremental Learning with Important and Diverse Memory

Contact Info

Product

Resources

About