Continual Learning with Scaled Gradient Projection

Saha, Gobinda C.; Roy, Kaushik

doi:10.1609/aaai.v37i8.26157

Cited by 4 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For Ada-QPacknet, the learning rate is set resorting to a dynamic scheduler starting from 0.01 and decreasing over time with a minimum of 0.0001 for all datasets except for TinyImagenet (starting from 0.001) and Ima-genet100 (constant learning rate of 0.0001). The adopted models are a two-layer neural network with fully connected layers (p-MNIST), reduced AlexNet (s-CIFAR100) [40], Resnet-18 (5 datasets and Imagenet), TinyNet (TinyImagenet) in accordance with model backbones used in the WSN paper [22]. For weight initialization, we adopt the Xavier initializer.…”

Section: Methodsmentioning

confidence: 99%

Ada-QPacknet – Multi-Task Forget-Free Continual Learning with Quantization Driven Adaptive Pruning

Pietron,

Zurek,

Faber

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

Continual learning (CL) is a challenging machine learning setting that is attracting the interest of an increasing number of researchers. Among recent CL works, architectural strategies appear particularly promising due to their potential to expand and adapt the model architecture as new tasks are presented. However, existing solutions do not efficiently exploit model sparsity due to the adoption of constant pruning ratios. Moreover, current approaches exhibit a tendency to quickly saturate model capacity since the number of weights is limited and each weight is restricted to a single value. In this paper, we propose Ada-QPacknet, a novel architectural CL method that resorts to adaptive pruning and quantization. These two features allow our model to overcome the two crucial issues of effective exploitation of model sparsity and efficient use of model capacity. Specifically, adaptive pruning restores model capacity by reducing the number of weights assigned to each task to a smaller subset of weights that preserves the performance of the full set, allowing other weights to be used for future tasks. Adaptive quantization separates each weight into multiple components with adaptively reduced bit-width, allowing a single weight to solve more than one task without significant performance drops, leading to improved exploitation of model capacity. Experimental results on benchmark CL scenarios show that our proposed method achieves better results in terms of accuracy than existing rehearsal, regularization, and architectural CL strategies. Moreover, our method significantly outperforms forget-free competitors in terms of efficient exploitation of model capacity.

show abstract

Section: Methodsmentioning

confidence: 99%