Difficult experiments in training neural networks often fail to converge due to what is known as the flat-spot problem, where the gradient of hidden neurons in the network diminishes in value, rending the weight update process ineffective. Whereas a first-order algorithm can address this issue by learning parameters to normalize neuron activations, the second-order algorithms cannot afford additional parameters given that they include a large Jacobian matrix calculation. This paper proposes Levenberg-Marquardt with weight compression (LM-WC), which combats the flat-spot problem by compressing neuron weights to push neuron activation out of the saturated region and close to the linear region. The presented algorithm requires no additional learned parameters and contains an adaptable compression parameter, which is adjusted to avoid training failure and increase the probability of neural network convergence. Several experiments are presented and discussed to demonstrate the success of LM-WC against standard LM and LM with random restarts on benchmark data sets for varying network architectures. Our results suggest that the LM-WC algorithm can improve training success by 10 times or more compared with other methods.
We first pose the Unsupervised Progressive Learning (UPL) problem: an online
representation learning problem in which the learner observes a non-stationary
and unlabeled data stream, learning a growing number of features that persist
over time even though the data is not stored or replayed. To solve the UPL
problem we propose the Self-Taught Associative Memory (STAM) architecture.
Layered hierarchies of STAM modules learn based on a combination of online
clustering, novelty detection, forgetting outliers, and storing only prototypical
features rather than specific examples. We evaluate STAM representations using
clustering and classification tasks. While there are no existing learning scenarios
that are directly comparable to UPL, we compare the STAM architecture with two
recent continual learning models, Memory Aware Synapses (MAS) and Gradient
Episodic Memories (GEM), after adapting them in the UPL setting.
Modern computer vision applications suffer from catastrophic forgetting when incrementally learning new concepts over time. The most successful approaches to alleviate this forgetting require extensive replay of previously seen data, which is problematic when memory constraints or data legality concerns exist. In this work, we consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL), where an incremental learning agent must learn new concepts over time without storing generators or training data from past tasks. One approach for DFCIL is to replay synthetic images produced by inverting a frozen copy of the learner's classification model, but we show this approach fails for common class-incremental benchmarks when using standard distillation strategies. We diagnose the cause of this failure and propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation, and show that our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks. Our method even outperforms several standard replay based methods which store a coreset of images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.