Deterministic Implementations for Reproducibility in Deep Reinforcement Learning

Nagarajan, Prabhat; Warnell, Garrett; Stone, Peter

doi:10.48550/arxiv.1809.05676

Cited by 14 publications

(25 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the success of deep learning, the generalization properties of deep networks received a renewed interest in recent years [24,6,55,28]. [11,56] establish spectrally normalized risk bounds for deep networks and [54] provides refined bounds by exploiting inter-layer Jacobian. [6] proposes tighter bounds using compression techniques.…”

Section: Related Workmentioning

confidence: 99%

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

Oymak¹,

Li²,

Soltanolkotabi³

2021

Preprint

View full text Add to dashboard Cite

Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (lower-level problem) and various hyperparameters such as the configuration of the architecture over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of the role of train-validation split. To this aim this work establishes the following results:• We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss. This reveals that the upper-level problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous spaces -which are highly relevant for popular differentiable search schemes.• We establish generalization bounds for NAS problems with an emphasis on an activation search problem. When optimized with gradient-descent, we show that the train-validation procedure returns the best (model, architecture) pair even if all architectures can perfectly fit the training data to achieve zero error.• Finally, we highlight rigorous connections between NAS, multiple kernel learning, and low-rank matrix learning. The latter leads to novel algorithmic insights where the solution of the upper problem can be accurately learned via efficient spectral methods to achieve near-minimal risk.

show abstract

Section: Related Workmentioning

confidence: 99%

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

Oymak¹,

Li²,

Soltanolkotabi³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We deploy two LeNet-based neural network architectures which differs only by the number of neurons in two of the layers in order to individually match the formats of the MNIST and CIFAR-10 datasets. Our TensorFlow code for the Delta method is based on the pydeepdelta Python module [14], and is fully deterministic [10]. The corresponding Bootstrap implementation can be found in the same repository.…”

Section: The Neural Network Classifiersmentioning

confidence: 99%

A Comparison of the Delta Method and the Bootstrap in Deep Learning Classification

Nilsen¹,

Munthe-Kaas²,

Skaug³

et al. 2021

Preprint

View full text Add to dashboard Cite

We validate the deep learning classification adapted Delta method introduced in [11] by a comparison with the classical Bootstrap. We show that there is a strong linear relationship between the quantified predictive epistemic uncertainty levels obtained from the two methods when applied on two LeNet-based neural network classifiers using the MNIST and CIFAR-10 datasets. Furthermore, we demonstrate that the Delta method offers a five times computation time reduction compared to the Bootstrap.

show abstract

“…The objective for deep models, on the other hand, will have multiple optima, and many which have roughly equal loss in average over all test examples, but differ on the individual predictions they provide to an example. For such models, nondeterminism in training may lead optimizers to different optima (Summers & Dinneen, 2021) (see also Nagarajan et al (2018)), that depend on the training randomness (Achille et al, 2017;Bengio et al, 2009).…”

Section: Introductionmentioning

confidence: 99%

Synthesizing Irreproducibility in Deep Networks

Snapp¹,

Shamir²

2021

Preprint

View full text Add to dashboard Cite

The success and superior performance of deep networks is spreading their popularity and use to an increasing number of applications. Very recent works, however, demonstrate that modern day deep networks suffer from irreproducibility (also referred to as nondeterminism or underspecification). Two or more models that are identical in architecture, structure, training hyper-parameters, and parameters, and that are trained on exactly the same training data, yield different predictions on individual previously unseen examples. Thus, a model that performs well on controlled test data, may perform in unexpected ways when deployed in the real world, whose data is expected to be similar to the test data. We study simple synthetic models and data to understand the origins of these problems. We show that even with a single nonlinearity and for very simple data and models, irreproducibility occurs. Our study demonstrates the effects of randomness in initialization, training data shuffling window size, and activation functions on prediction irreproducibility, even under very controlled synthetic data. While, as one would expect, randomness in initialization and in shuffling the training examples exacerbates the phenomenon, we show that model complexity and the choice of nonlinearity also play significant roles in making deep models irreproducible.

show abstract

Deterministic Implementations for Reproducibility in Deep Reinforcement Learning

Cited by 14 publications

References 1 publication

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

A Comparison of the Delta Method and the Bootstrap in Deep Learning Classification

Synthesizing Irreproducibility in Deep Networks

Contact Info

Product

Resources

About