Optimal memory-aware backpropagation of deep join networks

Beaumont, Olivier; Herrmann, Julien; Pallez, Guillaume; Shilova, Alena

doi:10.1098/rsta.2019.0049

Cited by 11 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Techniques which use speculative approaches [49], spiking neural network concepts [50,51] and memory use optimization [52] have also been proposed. Kim and Ko [53] and (separately) Ma, Lewis and Kleijn [54] have proposed techniques which train neural networks while specifically avoiding backpropagation altogether.…”

Section: Gradient Descent and Machine Learningmentioning

confidence: 99%

Expert system gradient descent style training: Development of a defensible artificial intelligence technique

Straub

2021

Knowledge-Based Systems

View full text Add to dashboard Cite

Section: Gradient Descent and Machine Learningmentioning

confidence: 99%

Expert system gradient descent style training: Development of a defensible artificial intelligence technique

Straub

2021

Knowledge-Based Systems

View full text Add to dashboard Cite

“…Other techniques have incorporated federated learning and momentum [81] and used evolutionary algorithms [82], speculative approaches [83] and spiking neural network concepts [84,85]. Yet other techniques have focused on supporting deep networks [86], memory use optimization [87], bias factors [88,89] and initial condition sensitivity [90]. A recent technique, proposed by Zhang, et al [91], utilizes a combination of expert strategies and gradient descent for optimization.…”

Section: Gradient Descentmentioning

confidence: 99%

Automating the design and development of gradient descent trained expert system networks

Straub¹

2022

Knowledge-Based Systems

View full text Add to dashboard Cite

“…This approach involves re-computing results during the backward pass to avoid saving results in the forward pass, trading more compute for less memory but guaranteeing identical results. All work in this domain has focused on ways to balance this trade off for different types of acyclic network graphs (Chen et al 2016;Gruslys et al 2016;Kumar et al 2019;Kusumoto et al 2019;Beaumont et al 2020). Our work instead performs recomputation in the forward pass, so that the backward pass produces an equivalent result, while using less compute time and less memory.…”

Section: Related Workmentioning

confidence: 99%

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Raff

Fleshman

Zak

et al. 2021

AAAI

View full text Add to dashboard Cite

Recent works within machine learning have been tackling inputs of ever increasing size, with cyber security presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, an input executable could be >=100 MB, which would translate to a time series with T=100,000,000 steps. To date, the closest approach to handling such task is MalConv --- a convolutional neural network capable of processing T=2,000,000 steps. Because the memory used by CNNs is O(T), this has prevented many from processing all executables or further extending the MalConv approach. In this work, we develop a new approach to temporal max pooling that makes the required memory invariant to the sequence length T. This makes MalConv 116x more memory efficient, and up to 25.8x faster to train, while removing the input length restrictions to MalConv. We re-invest these gains into improving the MalConv architecture by developing a new Global Channel Gating design, giving us an attention mechanism capable of learning feature interactions across 100 million time steps in an efficient manner, a capability lacked by the original MalConv approach.

show abstract

Optimal memory-aware backpropagation of deep join networks

Cited by 11 publications

References 12 publications

Expert system gradient descent style training: Development of a defensible artificial intelligence technique

Expert system gradient descent style training: Development of a defensible artificial intelligence technique

Automating the design and development of gradient descent trained expert system networks

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Contact Info

Product

Resources

About