An Investigation of How Neural Networks Learn from the Experiences of Peers Through Periodic Weight Averaging

Smith, Joshua; Gashler, Michael S.

doi:10.1109/icmla.2017.00-72

Cited by 8 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It was empirically found that vanilla averaging combines the knowledge contained in several neural networks into a single fused model [35,32]. The simple vanilla averaging, however, only works in the case when the weights of individuals networks are relatively close in the weight space.…”

Section: Model Fusionmentioning

confidence: 99%

“…The further details of skill transfer are given in Appendix D.1. We change the weight proportion of model B (also known as fusion rate in [32]) to find the weight combination that has the best performance.…”

Section: Skill Transfermentioning

confidence: 99%

“…The simplest technique in model fusion is vanilla averaging, which computes the weighted average of pre-trained network parameters without the need for retraining [35,32]. Their methods fuse neural networks with the same architecture without considering the permutation invariance nature of neural networks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks

Nguyen¹,

Nguyen²,

Phung³

et al. 2021

Preprint

View full text Add to dashboard Cite

Layer-wise model fusion via optimal transport, named OTFusion, applies soft neuron association for unifying different pre-trained networks to save computational resources. While enjoying its success, OTFusion requires the input networks to have the same number of layers. To address this issue, we propose a novel model fusion framework, named CLAFusion, to fuse neural networks with a different number of layers, which we refer to as heterogeneous neural networks, via cross-layer alignment. The cross-layer alignment problem, which is an unbalanced assignment problem, can be solved efficiently using dynamic programming. Based on the cross-layer alignment, our framework balances the number of layers of neural networks before applying layer-wise model fusion. Our synthetic experiments indicate that the fused network from CLAFusion achieves a more favorable performance compared to the individual networks trained on heterogeneous data without the need for any retraining. With an extra finetuning process, it improves the accuracy of residual networks on the CIFAR10 dataset. Finally, we explore its application for model compression and knowledge distillation when applying to the teacher-student setting.

show abstract

Section: Model Fusionmentioning

confidence: 99%

Section: Skill Transfermentioning

confidence: 99%

See 1 more Smart Citation

On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks

Nguyen¹,

Nguyen²,

Phung³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…For similar set of weights of the convolutional part, we cannot use the summation (3) as both terms in this expression would have similar values. Instead, the approach of (Smith & Gashler, 2017;Utans, 1996) would be ideal in this case and we can use weight averaging. However, the top fully connected parts still can be fused by weights summation.…”

Section: Fusion Of Deep Convolutional Neural Networkmentioning

confidence: 99%

“…As discussed in Section 2.4, for WS method we used slightly different procedure as before. As the weights of the convolutional parts of both networks originated from the same set of pretrained weights, during fusion by WS method, those weights were averaged according to approach adopted by Smith & Gashler (2017), Utans (1996). The weights of the classifiers on top of networks were obtained from an independent set of weights and thus were summed up as in all previous examples.…”

Section: Fusion Of Deep Neural Networkmentioning

confidence: 99%

Non-iterative Knowledge Fusion in Deep Convolutional Neural Networks

Leontev

Islenteva

Sukhov

2019

Neural Process Lett

View full text Add to dashboard Cite

Incorporation of a new knowledge into neural networks with simultaneous preservation of the previous one is known to be a nontrivial problem. This problem becomes even more complex when new knowledge is contained not in new training examples, but inside the parameters (connection weights) of another neural network. Here we propose and test two methods allowing combining the knowledge contained in separate networks. One method is based on a simple operation of summation of weights of constituent neural networks. Another method assumes incorporation of a new knowledge by modification of weights nonessential for the preservation of already stored information. We show that with these methods the knowledge from one network can be transferred into another one non-iteratively without requiring training sessions. The fused network operates efficiently, performing classification far better than a chance level. The efficiency of the methods is quantified on several publicly available data sets in classification tasks both for shallow and deep neural networks.Keywords: knowledge fusion, transfer learning, convolutional neural networks, non-iterative learning constructing ensembles, there could be a necessity to save the storage and computational resources by combining several networks into a single one, transferring information from one network into another. Unfortunately, there is very limited literature devoted to the discussion of this particular problem. Most likely, this relates to the common interpretation of neural networks as black boxes that do not allow access to the internally stored information. Thus, so far, the exchange of knowledge between neural networks was considered close to impossible. Nevertheless, there have been several approaches that allow neural networks to train other networks. For this purpose, Zeng & Martinez (2000) used pseudo training set sampled from a distribution of the original training set. Other approaches, such as Model Compression or Model Distillation, were suggested in

show abstract