What's Hidden in a Randomly Weighted Neural Network?

Ramanujan, Vivek; Wortsman, Mitchell; Kembhavi, Aniruddha; Farhadi, Ali; Rastegari, Mohammad

doi:10.48550/arxiv.1911.13299

Cited by 10 publications

(33 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The hypothesis space is composed of all candidate networks with different assignments of weight values that accomplish the computation task. Consistent with previous studies [7][8][9], the optimal random network ensemble includes sub-networks of the original full network, which further allows for capturing uncertainty in the hypothesis space. The model can be solved by mean-field methods, thereby providing a physics interpretation of how credit assignment occurs in a hierarchical deep neural system.…”

supporting

confidence: 55%

“…Excitingly, recent works showed that there exist subnetworks of random weights that are able to produce better-than-chance accuracies [7][8][9]. This property seems to be universal across different architectures, datasets and computational tasks [10].…”

mentioning

confidence: 95%

“…However, the decision making behavior of the output neurons in a deep network is still challenging to understand in terms of computational principles of single building components (either neurons or weights). Recent empirical machine learning works showed that a subnetwork of random weights can produce a better-than-chance accuracy [7][8][9]. This clearly suggests that there may exist a random ensemble of neural networks that fulfill the computational task given the width and depth of the deep network.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Learning credit assignment

Li,

Huang

2020

Preprint

View full text Add to dashboard Cite

Deep learning has achieved impressive prediction accuracies in a variety of scientific and industrial domains. However, the nested non-linear feature of deep learning makes the learning highly nontransparent, i.e., it is still unknown how the learning coordinates a huge number of parameters to achieve a decision making. To explain this hierarchical credit assignment, we propose a mean-field learning model by assuming that an ensemble of sub-networks, rather than a single network, are trained for a classification task. Surprisingly, our model reveals that apart from some deterministic synaptic weights connecting two neurons at neighboring layers, there exist a large number of connections that can be absent, and other connections can allow for a broad distribution of their weight values. Therefore, synaptic connections can be classified into three categories: very important ones, unimportant ones, and those of variability that may partially encode nuisance factors. Therefore, our model learns the credit assignment leading to the decision, and predicts an ensemble of subnetworks that can accomplish the same task, thereby providing insights toward understanding the macroscopic behavior of deep learning through the lens of distinct roles of synaptic weights.

show abstract

supporting

confidence: 55%

mentioning

confidence: 95%

mentioning

confidence: 99%

See 1 more Smart Citation

Learning credit assignment

Li,

Huang

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…We leverage this to develop a flexible model capable of learning thousands of tasks: Supermasks in Superposition (SupSup). SupSup, diagrammed in Figure 1, is driven by two core ideas: a) the expressive power of untrained, randomly weighted subnetworks [55,38], and b) inference of task-identity as a gradient-based optimization problem.…”

Section: Introductionmentioning

confidence: 99%

Supermasks in Superposition

Wortsman¹,

Ramanujan²,

Liu³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned supermasks which minimizes the output entropy. In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks. We also showcase two promising extensions. First, SupSup models can be trained entirely without task identity information, as they may detect when they are uncertain about new data and allocate an additional supermask for the new training distribution. Finally the entire, growing set of supermasks can be stored in a constant-sized reservoir by implicitly storing them as attractors in a fixed-sized Hopfield network.

show abstract

“…• Fact 6: Terms in (13), wherein, the base case is generated as • Fact 8: For k crossings of the base paths there are 4 k+1 splicings possible, and those many terms are extra in the E K 0 (s, s ) 2 expression in (13), when compared to the E [K 0 (s, s )] 2 expression in (12).…”

Section: Notationmentioning

confidence: 99%

Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning

Lakshminarayanan¹,

Singh²

2020

Preprint

View full text Add to dashboard Cite

Rectified linear unit (ReLU) activations can also be thought of as gates, which, either pass or stop their pre-activation input when they are on (when the preactivation input is positive) or off (when the pre-activation input is negative) respectively. A deep neural network (DNN) with ReLU activations has many gates, and the on/off status of each gate changes across input examples as well as network weights. For a given input example, only a subset of gates are active, i.e., on, and the sub-network of weights connected to these active gates is responsible for producing the output. At randomised initialisation, the active sub-network corresponding to a given input example is random. During training, as the weights are learnt, the active sub-networks are also learnt, and potentially hold very valuable information.In this paper, we analytically characterise the role of active sub-networks in deep learning. To this end, we encode the on/off state of the gates of a given input in a novel neural path feature (NPF), and the weights of the DNN are encoded in a novel neural path value (NPV). Further, we show that the output of network is indeed the inner product of NPF and NPV. The main result of the paper shows that the neural path kernel associated with the NPF is a fundamental quantity that characterises the information stored in the gates of a DNN. We show via experiments (on MNIST and CIFAR-10) that in standard DNNs with ReLU activations NPFs are learnt during training and such learning is key for generalisation. Furthermore, NPFs and NPVs can be learnt in two separate networks and such learning also generalises well in experiments. In our experiments, we observe that almost all the information learnt by a DNN with ReLU activations is stored in the gates -a novel observation that underscores the need to further investigate the role of gating in DNNs.

show abstract

What's Hidden in a Randomly Weighted Neural Network?

Cited by 10 publications

References 11 publications

Learning credit assignment

Learning credit assignment

Supermasks in Superposition

Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning

Contact Info

Product

Resources

About