Ongoing transistor scaling and the growing complexity of embedded system designs has led to the rise of MPSoCs (Multi-Processor System-on-Chip), combining multiple hard-core CPUs and accelerators (FPGA, GPU) on the same physical die. These devices are of great interest to the supercomputing community, who are increasingly reliant on heterogeneity to achieve power and performance goals in these closing stages of the race to exascale. In this paper, we present a network interface architecture and networking infrastructure, designed to sit inside the FPGA fabric of a cutting-edge MPSoC device, enabling networks of these devices to communicate within both a distributed and shared memory context, with reduced need for costly software networking system calls. We will present our implementation and prototype system and discuss the main design decisions relevant to the use of the Xilinx Zynq Ultrascale+, a state-of-the-art MPSoC, and the challenges to be overcome given the device's limitations and constraints. We demonstrate the working prototype system connecting two MPSoCs, with communication between processor and remote memory region and accelerator. We then discuss the limitations of the current implementation and highlight areas of improvement to make this solution production-ready.
KEYWORDSdistributed shared memory, FPGA, HPC, interconnect, MPSoC, networks
INTRODUCTION AND MOTIVATIONOver the past decade, the embedded systems landscape has changed dramatically due to the growing demands from the mobile market and the rise of the Internet of Things. These advances led to the SoC paradigm, with increasingly complex and heterogeneous systems placed upon the same physical die. At the same time, the High Performance Computing (HPC) community has had to deal with the consequences of the breakdown of Dennard scaling, 1 causing an explosion in the core count and power consumption of the largest machines to keep pace with the demands for ever greater computing capabilities.These two phenomena have created an opportunity for convergence between the HPC and the Data Center markets, and to the use of low-power, mobile processors, which are beginning to penetrate the server market, 2 and are even being used by the RIKEN institute for the next stage of their roadmap to an Exascale class machine, the post-K computer. 3 It is unsurprising that this shift is happening, given the greatest challenge that computer and system architects now face in the race to exascale is the need for increasing energy efficiency. Naively scaling out current architectures, eg, in the TOP500, would result in an exascale machine requiring in excess of 100MW power, which is unrealistic in terms of infrastructure and cost.Designers are facing the challenge of reducing power consumption by a number of means, ie, increased component density, tighter coupling between processor/memory/accelerator/network, shorter paths for copper lines, increased performance/Watt of components, hyper-converged storage, etc.The relentless quest for increasingly more power-effici...