2023
DOI: 10.1609/aaai.v37i8.26157
|View full text |Cite
|
Sign up to set email alerts
|

Continual Learning with Scaled Gradient Projection

Abstract: In neural networks, continual learning results in gradient interference among sequential tasks, leading to catastrophic forgetting of old tasks while learning new ones. This issue is addressed in recent methods by storing the important gradient spaces for old tasks and updating the model orthogonally during new tasks. However, such restrictive orthogonal gradient updates hamper the learning capability of the new tasks resulting in sub-optimal performance. To improve new learning while minimizing forgetting, in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 17 publications
0
0
0
Order By: Relevance
“…For Ada-QPacknet, the learning rate is set resorting to a dynamic scheduler starting from 0.01 and decreasing over time with a minimum of 0.0001 for all datasets except for TinyImagenet (starting from 0.001) and Ima-genet100 (constant learning rate of 0.0001). The adopted models are a two-layer neural network with fully connected layers (p-MNIST), reduced AlexNet (s-CIFAR100) [40], Resnet-18 (5 datasets and Imagenet), TinyNet (TinyImagenet) in accordance with model backbones used in the WSN paper [22]. For weight initialization, we adopt the Xavier initializer.…”
Section: Methodsmentioning
confidence: 99%
“…For Ada-QPacknet, the learning rate is set resorting to a dynamic scheduler starting from 0.01 and decreasing over time with a minimum of 0.0001 for all datasets except for TinyImagenet (starting from 0.001) and Ima-genet100 (constant learning rate of 0.0001). The adopted models are a two-layer neural network with fully connected layers (p-MNIST), reduced AlexNet (s-CIFAR100) [40], Resnet-18 (5 datasets and Imagenet), TinyNet (TinyImagenet) in accordance with model backbones used in the WSN paper [22]. For weight initialization, we adopt the Xavier initializer.…”
Section: Methodsmentioning
confidence: 99%