Some Deep Neural Networks (DNN) have what we call lanes, or they can be reorganized as such. Lanes are paths in the network which are data-independent and typically learn different features or add resilience to the network. Given their data-independence, lanes are amenable for parallel processing. The Multi-lane CapsNet (MLCN) is a proposed reorganization of the Capsule Network which is shown to achieve better accuracy while bringing highly-parallel lanes. However, the efficiency and scalability of MLCN had not been systematically examined.In this work, we study the MLCN network with multiple GPUs finding that it is 2x more efficient than the original CapsNet when using model-parallelism. Further, we present the load balancing problem of distributing heterogeneous lanes in homogeneous or heterogeneous accelerators and show that a simple greedy heuristic can be almost 50% faster than a naïve random approach.Index Terms-deep learning capsule network multi-lane
I. INTRODUCTIONSeveral approaches to the distributed model parallelization of Deep Neural Networks (DNN) have concentrated in their depth [1]-[3], but DNNs can also be organized in a way to be parallelized in their width [4]. The DNN architecture may be organized into distinct neural network lanes [5]. This creates separable and resource efficient data-independent paths in the network that can be used to learn different features or add resilience to the network. Examples of neural networks with lanes are the Google Inception [6], [7] and the Multi-lane Capsule Network (MLCN) [5]. As these lanes are data-independent they can be (1) processed in parallel and (2) specialized for distinct computational targets (CPUs, GPU, FPGAs, and cloud), as well as resourceconstrained mobile and IoT targets, leading to opportunities and challenges. Recently, our research focus was on Multi-Lane Capsule Networks (MLCN), which are a separable and resource efficient organization of Capsule Networks (CapsNet) This work was supported in part by CAPES/Brasil (Finance Code 001), by CNPq (313012/2017-2), and by Fapesp (CCES 2013/08293-7). We would like to thank Google Cloud Platform for a grant to run our experiments.V. M. do Rosario, is a Ph.D.