Balancing Stability and Plasticity through Advanced Null Space in Continual Learning

Kong, Yajing; Liu, Liu; Wang, Zhen; Tao, Dacheng

doi:10.48550/arxiv.2207.12061

Cited by 1 publication

(2 citation statements)

References 30 publications

(69 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NCL (Kao et al, 2021) combines the idea of gradient projection and Bayesian weight regularization to mitigate catastrophic forgetting. In spite of minimizing backward interference, these approaches suffer poor forward knowledge transfer and lack plasticity (Kong et al, 2022). TRGP (Lin et al, 2022) expands the model with trust regions to achieve better performance on new tasks by introducing additional scale parameters.…”

Section: Gradient Projection Methodsmentioning

confidence: 99%

“…To mitigate forgetting and facilitate forward knowledge transfer, replay-based methods (Lopez-Paz & Ranzato, 2017;Shin et al, 2017;Choi et al, 2021) stores some old samples in the memory, and expansion-based methods (Rusu et al, 2016;Yoon et al, 2017;2019) expand the model structure to accommodate incoming knowledge. However, these methods require either extra memory buffers (Parisi et al, 2019) or a growing network architecture as new tasks continually arrive (Kong et al, 2022), which are always computationally expansive (De Lange et al, 2021). Thus, promoting performance within a fixed network capacity remains challenging.…”

Section: Work In Progressmentioning

confidence: 99%

See 1 more Smart Citation

Restricted Orthogonal Gradient Projection for Continual Learning

Yang¹,

Yang²,

Liu³

2023

Preprint

View full text Add to dashboard Cite

Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches using a fixed network architecture. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.

show abstract

Section: Gradient Projection Methodsmentioning

confidence: 99%