Training Deep and Recurrent Networks with Hessian-Free Optimization

Martens, J.; Sutskever, Ilya

doi:10.1007/978-3-642-35289-8_27

Cited by 386 publications

(497 citation statements)

References 29 publications

Supporting

Mentioning

464

Contrasting

Unclassified

Order By: Relevance

“…Also in 2011 it was shown (Martens and Sutskever, 2011) that Hessian-free optimization (e.g., Møller, 1993;Pearlmutter, 1994;Schraudolph, 2002) (Sec. 5.6.2) can alleviate the Fundamental Deep Learning Problem (Sec.…”

Section: : Hessian-free Optimization For Rnnsmentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

Section: : Hessian-free Optimization For Rnnsmentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…While HF, like all truncated-Newton methods, takes steps computed using partially converged calls to CG, it is naturally accelerated along at least some directions of lower curvature compared to the gradient. It can even be shown (Martens & Sutskever, 2012) that CG will tend to favor convergence to the exact solution to the quadratic sub-problem first along higher curvature directions (with a bias towards those which are more clustered together in their curvature-scalars/eigenvalues).…”

Section: Momentum and Hfmentioning

confidence: 99%

Why Momentum Really Works.

Goh

2017

Distill

104

View full text Add to dashboard Cite

Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and well-initialized networks perform markedly worse when the momentum is absent or poorly tuned.Our success training these models suggests that previous attempts to train deep and recurrent neural networks from random initializations have likely failed due to poor initialization schemes. Furthermore, carefully tuned momentum methods su ce for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods.

show abstract

“…In order to analyze continuous time series of network data with highly complex structure, the RNN-GBRBM (modified RNN-RBM) is adopted. Combining the desirable characteristics of RNNs and RBMs have proven to be non-trivial [16] because RNN enables the network to have a simple version of memory with very minimal overhead and allows more freedom to describe the temporal dependencies involved [17], as well as because RBM can capture complicated, high-order correlations between the activities of hidden features [18] and provide a closed-form representation of the distribution underlying the observations [10]. Moreover, a semi-supervised incremental updating algorithm, which is appropriate for training the decoder and updating the parameter of classifier, is proposed.…”

Section: System Architecturementioning

confidence: 99%

“…s is defined as Eq. (17). Where x t is the t-th network data in a network data sequence, d decoded is the dimensionality of the decoded feature vector s (x t , Δt), s k (x t ) is a binary value which indicates the k-th code of decoded features, and Δt is the number of the hidden units in RNN which indicates s is encoded with Δt decoded features h (x t ), h (x t+1 ), .…”

Section: Rnn-gbrbmmentioning

confidence: 99%

A Novel RNN-GBRBM Based Feature Decoder for Anomaly Detection Technology in Industrial Control Network

Hua

Zhu

Ma³

et al. 2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYAs advances in networking technology help to connect industrial control networks with the Internet, the threat from spammers, attackers and criminal enterprises has also grown accordingly. However, traditional Network Intrusion Detection System makes significant use of pattern matching to identify malicious behaviors and have bad performance on detecting zero-day exploits in which a new attack is employed. In this paper, a novel method of anomaly detection in industrial control network is proposed based on RNN-GBRBM feature decoder. The method employ network packets and extract high-quality features from raw features which is selected manually. A modified RNN-RBM is trained using the normal traffic in order to learn feature patterns of the normal network behaviors. Then the test traffic is analyzed against the learned normal feature pattern by using osPCA to measure the extent to which the test traffic resembles the learned feature pattern. Moreover, we design a semi-supervised incremental updating algorithm in order to improve the performance of the model continuously. Experiments show that our method is more efficient in anomaly detection than other traditional approaches for industrial control network.

show abstract

Training Deep and Recurrent Networks with Hessian-Free Optimization

Cited by 386 publications

References 29 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

Why Momentum Really Works.

A Novel RNN-GBRBM Based Feature Decoder for Anomaly Detection Technology in Industrial Control Network

Contact Info

Product

Resources

About