Target layer regularization for continual learning using Cramer-Wold distance

Mazur, Marcin; Pustelnik, Łukasz; Knop, Szymon; Pagacz, Patryk; Spurek, Przemysław

doi:10.1016/j.ins.2022.07.085

Cited by 6 publications

(1 citation statement)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, the elastic weight integration (EWC) [36] measures the sensitivity of the parameters relative to each task through the Fisher information matrix of the parameters calculated by KL divergence and indicates which parameters need to be retained most to avoid forgetting old tasks. Mazur et al [48] proposed the idea of CW-TaLaR, similar to EWC, using the Cremer-Will distance (instead of the KL divergence) to calculate the penalty term directly. Synaptic intelligence (SI) [37] calculates the path integral of the motion trajectory of parameters in the training process and takes the absolute integral as an indicator to measure the importance of parameters, which can be represented by the local contribution of each parameter to the overall loss change in the training process.…”

Section: Continuous Learning Methods Based On Parameter Importance Ca...mentioning

confidence: 99%

CL-BPUWM: continuous learning with Bayesian parameter updating and weight memory

He,

Yang,

et al. 2024

Complex Intell. Syst.

View full text Add to dashboard Cite

Catastrophic forgetting in neural networks is a common problem, in which neural networks lose information from previous tasks after training on new tasks. Although adopting a regularization method that preferentially retains the parameters important to the previous task to avoid catastrophic forgetting has a positive effect; existing regularization methods cause the gradient to be near zero because the loss is at the local minimum. To solve this problem, we propose a new continuous learning method with Bayesian parameter updating and weight memory (CL-BPUWM). First, a parameter updating method based on the Bayes criterion is proposed to allow the neural network to gradually obtain new knowledge. The diagonal of the Fisher information matrix is then introduced to significantly minimize computation and increase parameter updating efficiency. Second, we suggest calculating the importance weight by observing how changes in each network parameter affect the model prediction output. In the process of model parameter updating, the Fisher information matrix and the sensitivity of the network are used as the quadratic penalty terms of the loss function. Finally, we apply dropout regularization to reduce model overfitting during training and to improve model generalizability. CL-BPUWM performs very well in continuous learning for classification tasks on CIFAR-100 dataset, CIFAR-10 dataset, and MNIST dataset. On CIFAR-100 dataset, it is 0.8%, 1.03% and 0.75% higher than the best performing regularization method (EWC) in three task partitions. On CIFAR-10 dataset, it is 2.25% higher than the regularization method (EWC) and 0.7% higher than the scaled method (GR). It is 0.66% higher than the regularization method (EWC) on the MNIST dataset. When the CL-BPUWM method was combined with the brain-inspired replay model under the CIFAR-100 and CIFAR-10 datasets, the classification accuracy was 2.35% and 5.38% higher than that of the baseline method, BI-R + SI.

show abstract

Section: Continuous Learning Methods Based On Parameter Importance Ca...mentioning

confidence: 99%

CL-BPUWM: continuous learning with Bayesian parameter updating and weight memory

He,

Yang,

et al. 2024

Complex Intell. Syst.

View full text Add to dashboard Cite

show abstract

Hierarchically structured task-agnostic continual learning

Hihn

Braun

2022

Mach Learn

View full text Add to dashboard Cite

One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from a Bayesian perspective and show its connections to previous approaches to continual learning. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Equipped with a diverse and specialized set of parameters, each path can be regarded as a distinct sub-network that learns to solve tasks. To improve expert allocation, we introduce diversity objectives, which we evaluate in additional ablation studies. Importantly, our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method on continual reinforcement learning and variants of the MNIST, CIFAR-10, and CIFAR-100 datasets.

show abstract

Dynamic data-free knowledge distillation by easy-to-hard learning strategy

Zhou

et al. 2023

Information Sciences

View full text Add to dashboard Cite

Target layer regularization for continual learning using Cramer-Wold distance

Cited by 6 publications

References 7 publications

CL-BPUWM: continuous learning with Bayesian parameter updating and weight memory

CL-BPUWM: continuous learning with Bayesian parameter updating and weight memory

Hierarchically structured task-agnostic continual learning

Dynamic data-free knowledge distillation by easy-to-hard learning strategy

Contact Info

Product

Resources

About