abcdefghdifjklblmin ofjldp oqn mefifirfdkgn kkn ld a sakt u t vwxmywz wxt v{|t u }t x~{u wv
ExaNest is one of three European projects that support a ground-breaking computing architecture for exascale-class systems built upon power-efficient 64-bit ARM processors. This group of projects share an 'everything-close' and 'share-anything' paradigm, which trims down the power consumption - by shortening the distance of signals for most data transfers - as well as the cost and footprint area of the installation - by reducing the number of devices needed to meet performance targets. In ExaNeSt, we will design and implement: (i) a physical rack prototype and its liquid-cooling subsystem providing ultra-dense compute packaging, (ii) a storage architecture with distributed (in-node) non-volatile memory (NVM) devices, (iii) a unified, low-latency interconnect, designed to efficiently uphold desired Quality-of-Service guarantees for a mix of storage with inter-processor flows, and (iv) efficient rack-level memory sharing, where each page is cacheable at only a single node . Our target is to test alternative storage and interconnect options on actual hardware, using real-world HPC applications. The ExaNeSt consortium brings together technology, skills, and knowledge across the entire value chain, from computing IP, packaging, and system deployment, all the way up to operating systems, storage, HPC, big data frameworks, and cutting-edge applications
SpiNNaker is a massively parallel architecture designed to model large-scale spiking neural networks in (biological) real-time. Its design is based around ad-hoc multi-core System-on-Chips which are interconnected using a two-dimensional toroidal triangular mesh. Neurons are modeled in software and their spikes generate packets that propagate through the on-and inter-chip communication fabric relying on custom-made on-chip multicast routers. This paper models and evaluates large-scale instances of its novel interconnect (more than 65 thousand nodes, or over one million computing cores), focusing on real-time features and fault-tolerance. The key contribution can be summarized as understanding the properties of the feasible topologies and establishing the stable operation of the SpiNNaker under different levels of degradation. First we derive analytically the topological characteristics of the network, which are later confirmed by experimental work. With the computational model developed, we investigate the topology of SpiNNaker, and compare it with a standard 3-dimensional torus. The novel emergency routing mechanism, implemented within the routers, allows the topology of SpiNNaker to be more robust than the 3-dimensional torus, regardless of the latter having better topological characteristics. Furthermore, we obtain optimal values of two router parameters related with livelock and deadlock avoidance mechanisms.
Abstract-Many current parallel computers are built around a torus interconnection network. Machines from Cray, HP, and IBM, among others, make use of this topology. In terms of topological advantages, square (2D) or cubic (3D) tori would be the topologies of choice. However, for different practical reasons, 2D and 3D tori with different number of nodes per dimension have been used. These mixed-radix topologies are not edge symmetric, which translates into poor performance due to an unbalanced use of network resources. In this work, we analyze twisted 2D and 3D mixed-radix tori that remove the network bottlenecks present in nontwisted ones. Such topologies recover edge symmetry, and consequently, balance the utilization of their links. The distance-related properties of twisted tori together with a full characterization of their bisection bandwidth are described in this paper. A simulation-based performance evaluation has been carried out to assess the network performance under synthetic and trace-driven workloads. The obtained results show noticeable and consistent performance gains (up to an increase of 74 percent in accepted load). In addition, we propose scalable and practicable packet routing mechanisms and wiring layouts for these interconnection systems. The complexity of the architectural proposals is similar to the one exhibited by routing and folding mechanisms in standard tori.
Abstract-High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience.In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.
Optical on-chip communication is considered a promising candidate to overcome latency and energy bottlenecks of electrical interconnects. Although recently proposed hybrid Networks-on-chip (NoCs), which implement both electrical and optical links, improve power efficiency, they often fail to combine these two interconnect technologies efficiently and suffer from considerable laser power overheads caused by high-bandwidth optical links. We argue that these overheads can be avoided by inserting a higher quantity of low-bandwidth optical links in a topology, as this yields lower optical loss and in turn laser power. Moreover, when optimally combined with electrical links for short distances, this can be done without trading off latency. We present the effectiveness of this concept with Lego, our hybrid, mesh-based NoC that provides high power efficiency by utilizing electrical links for local traffic, and low-bandwidth optical links for long distances. Electrical links are placed systematically to outweigh the serialization delay introduced by the optical links, simplify router microarchitecture, and allow to save optical resources. Our routing algorithm always chooses the link that offers the lowest latency and energy. Compared to state-of-the-art proposals, Lego increases throughputper-watt by at least 40%, and lowers latency by 35% on average for synthetic traffic. On SPLASH-2/PARSEC workloads, Lego improves power efficiency by at least 37% (up to 3.5×).
Increasing fault rates in current and future technology nodes coupled with on-chip components in the hundreds calls for robust and fault-tolerant Network-on-Chip (NoC) designs. Given the central role of NoCs in today's many-core chips, permanent faults impeding their original functionality may significantly influence performance, energy consumption, and correct operation of the entire system. As a result, fault-tolerant NoC design gained much attention in recent years. In this article, we review the vast research efforts regarding a NoC's components, namely, topology, routing algorithm, router microarchitecture, as well as system-level approaches combined with reconfiguration; discuss the proposed architectures; and identify outstanding research questions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.