The explosive growth of deep learning (DL)-based artificial intelligence (AI) applications necessitates extraordinary computing capabilities that cannot be achieved using traditional CPU standalone computing. Therefore, the heavy mission-critical DL kernel computing currently relies on a heterogeneous computing (HGC) platform integrated with CPUs, GPUs, and accelerators, as well as substantial data storage elements. However, the metallic electrical interconnection in the existing manycore platform would not be sustainable for handling the massively increasing bandwidth demand of big data driven AI applications. Incorporating an optical network-on-chip (ONoC) for providing ultrahigh bandwidth, we propose a rapid topology generation and core mapping of ONoC (REGO) for energy-efficient HGC multicore architecture. The genetic algorithm (GA)-based REGO utilizes the structural characteristics of the optical router to the fitness function and thus compromises the trade-off between the required throughput, optical signal-to-noise ratio (OSNR), and total energy consumption. Furthermore, the crossover step accelerates the convergence speed by suppressing randomness in the GA, thus significantly reducing excessive running time owing to the NP-hard property. The generated ONoC through REGO demonstrates, on an average, an increase of 63.29 % and 22.80 % in throughput and a decrease of 50.24 % and 9.56 % in energy per bit, in the VGG-16 and VGG-19 compared with the conventional mesh-and torus-topologybased ONoCs, respectively.
INDEX TERMSDeep learning kernel, genetic algorithm, heterogeneous computing platform, topology generation, optical network-on-chip I. INTRODUCTION Deep learning (DL), a class of machine learning algorithms, trains a nonlinear function approximator represented by a deep neural network (DNN) architecture using input-output pairs of training data [1]. The primary goal of DL is to improve accuracy by learning the weights through backward propagation of errors (backpropagation). Repetitive operations that occur while learning errors in backpropagation require extremely high parallelism and vector-matrix operations. Therefore, a heterogeneous computing (HGC) platform that combines various types of processors and dedicated accelerators is required instead of a legacy CPU-based architecture [2]. In addition, an ultra-wideband on-chip network infrastructure is essential for handling excessively heavy data traffic.Network-on-chip (NoC) is a scalable solution for onchip communication infrastructure that can handle the everincreasing processor cores integrated on a single chip. However, despite the continuing progress in transistor miniaturization, the challenging problems in the backend-of-the-line (BEOL) fabrication steps that form the interconnect layer using metallic interconnects impede the expansion of the onchip communication bandwidth. An optical NoC (ONoC) based on silicon photonics is being actively investigated as an alternative to electrical NoCs (ENoCs). Semiconductor industries such as IBM, Intel, and ...