We investigate in this paper the architecture of deep convolutional networks. Building on existing state of the art models, we propose a reconfiguration of the model parameters into several parallel branches at the global network level, with each branch being a standalone CNN. We show that this arrangement is an efficient way to significantly reduce the number of parameters without losing performance or to significantly improve the performance with the same level of performance. The use of branches brings an additional form of regularization. In addition to the split into parallel branches, we propose a tighter coupling of these branches by placing the "fuse (averaging) layer" before the Log-Likelihood and SoftMax layers during training. This gives another significant performance improvement, the tighter coupling favouring the learning of better representations, even at the level of the individual branches. We refer to this branched architecture as "coupled ensembles". The approach is very generic and can be applied with almost any DCNN architecture. With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%, and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and 1.42% respectively on these tasks. arXiv:1709.06053v1 [cs.CV] 18 Sep 2017To be submitted as a conference paper at ICLR 2018 is simple to implement and we provide a wrapper to compose different standard architectures at: https://github.com/vabh/coupled_ensembles.In this paper, we make the following contributions: (i) we show that given a parameter budjet, splitting a large network into an ensemble of smaller parallel branches of the same type, and jointly training them performs better or at par; (ii) when a final SoftMax (SM) layer is used during the prediction step, we show that ensemble fusion works better when averaging is done before this layer than when it is done after; (iii) when a final Log-Likelihood (LL) layer is used during the training step, we show that ensemble fusion of branches works better when the fusion is done before this layer than when it is done after; (iv) combining all these elements, we significantly improved the performance and/or significantly reduce the parameter count of state-of-the-art neural network architectures on CIFAR and SVHN data sets. (v) We show that such multi-branch networks can be further ensembled at a higher level still producing a significant performance gain. This paper is organised as follows: in section 2, we discuss related work; in section 3, we introduce the concept of coupled ensembles and the motivation behind the idea; in section 4, we present the evaluation of the proposed approach and compare it with the state of the art; and we conclude and discuss future work in section 5.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.