Recently, spectral kernels have attracted wide attention in complex dynamic environments. These advanced kernels mainly focus on breaking through the crucial limitation on locality, that is, the stationarity and the monotonicity. But actually, owing to the inefficiency of shallow models in computational elements, they are more likely unable to accurately reveal dynamic and potential variations. In this paper, we propose a novel deep spectral kernel network (DSKN) to naturally integrate non-stationary and non-monotonic spectral kernels into elegant deep architectures in an interpretable way, which can be further generalized to cover most kernels. Concretely, we firstly deal with the general form of spectral kernels by the inverse Fourier transform. Secondly, DSKN is constructed by embedding the preeminent spectral kernels into each layer to boost the efficiency in computational elements, which can effectively reveal the dynamic input-dependent characteristics and potential long-range correlations by compactly representing complex advanced concepts. Thirdly, detailed analyses of DSKN are presented. Owing to its universality, we propose a unified spectral transform technique to flexibly extend and reasonably initialize domain-related DSKN. Furthermore, the representer theorem of DSKN is given. Systematical experiments demonstrate the superiority of DSKN compared to state-of-the-art relevant algorithms on varieties of standard real-world tasks.
Different from popular neural networks using quasiconvex activations, non-monotonic networks activated by periodic nonlinearities have emerged as a more competitive paradigm, offering revolutionary benefits: 1) compactly characterizing high-frequency patterns; 2) precisely representing high-order derivatives. Nevertheless, they are also well-known for being hard to train, due to easily over-fitting dissonant noise and only allowing for tiny architectures (shallower than 5 layers). The fundamental bottleneck is that the periodicity leads to many poor and dense local minima in solution space. The direction and norm of gradient oscillate continually during error backpropagation. Thus non-monotonic networks are prematurely stuck in these local minima, and leave out effective error feedback. To alleviate the optimization dilemma, in this paper, we propose a non-trivial soft transfer approach. It smooths their solution space close to that of monotonic ones in the beginning, and then improve their representational properties by transferring the solutions from the neural space of monotonic neurons to the Fourier space of non-monotonic neurons as the training continues. The soft transfer consists of two core components: 1) a rectified concrete gate is constructed to characterize the state of each neuron; 2) a variational Bayesian learning framework is proposed to dynamically balance the empirical risk and the intensity of transfer. We provide comprehensive empirical evidence showing that the soft transfer not only reduces the risk of non-monotonic networks on over-fitting noise, but also helps them scale to much deeper architectures (more than 100 layers) achieving the new state-of-the-art performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.